New goodness-of-fit diagnostics for conditional discrete response models
aa r X i v : . [ m a t h . S T ] J un New goodness-of-fit diagnostics for conditionaldiscrete response models
Igor Kheifets ∗ and Carlos Velasco † September 19, 2018
Abstract
This paper proposes new specification tests for conditional models with dis-crete responses, which are key to apply efficient maximum likelihood methods, toobtain consistent estimates of partial effects and to get appropriate predictions ofthe probability of future events. In particular, we test the static and dynamic or-dered choice model specifications and can cover infinite support distributions fore.g. count data. The traditional approach for specification testing of discrete re-sponse models is based on probability integral transforms of a jittered discrete datawhich leads to continuous uniform iid series under the true conditional distribution.Then, standard specification testing techniques for continuous variables could beapplied to the transformed series, but the extra randomness from jitters affectsthe power properties of these methods. We investigate in this paper an alternativetransformation based only on original discrete data that avoids any randomiza-tion. We analyze the asymptotic properties of goodness-of-fit tests based on thisnew transformation and explore the properties in finite samples of a bootstrap al-gorithm to approximate the critical values of test statistics which are model andparameter dependent. We show analytically and in simulations that our approachdominates the methods based on randomization in terms of power. We apply thenew tests to models of the monetary policy conducted by the Federal Reserve.
Keywords : Specification tests, count data, dynamic discrete choice models, con-ditional probability integral transform.
JEL classification : C12, C22, C52. ∗ ITAM, Mexico. Email: [email protected] † Department of Economics, Universidad Carlos III de Madrid. Email: [email protected] INTRODUCTION
Many statistical models specify the conditional distribution of a discrete response vari-able given some explanatory variables, including the description of binary, multinomial,ordered choice and count data. In this paper we analyze goodness-of-fit tests for bothstatic models with covariates as well as dynamic ordered choice and count data models,where the conditioning information set may also include past information on the discretevariable and a set of (contemporaneous) explanatory variables which frequently appear inthe social sciences, see Kedem and Fokianos (2002) and Greene and Hensher (2010). Forexample, dynamic models are popular in macroeconomic applications, see for instanceHamilton and Jord´a (2002), Dolado and Maria-Dolores (2002) and Basu and de Jong(2007) for modeling central banks decisions or Kauppi and Saikkonen (2008) and Startz(2008) for predicting US recessions; in finance, see e.g. Rydberg and Shephard (2003) formodeling the size of asset price movements and Fokianos et al. (2009) for the number oftransactions per minute of a particular stock.Suppose we observe the random variables { Y t , X ′ t } Tt =1 and consider the information setsΩ t = { X t , Y t − , X t − , Y t − , X t − , . . . } for each period t = 1 , , . . . , T . We are interestedin testing the null hypothesis that the distribution of Y t conditional on Ω t is in theparametric family F t,θ ( · | Ω t ), i.e. H : Y t | Ω t ∼ F t,θ ( · | Ω t ) for some θ ∈ Θ , t = 1 , , . . . , T, where Θ ⊂ R m is the parameter space, while the alternative hypothesis ( H ) for theomnibus test would be the negation of H .We consider a class M of discrete conditional distributions defined on K = { , , . . . , K } ,for integer K > K = { , , . . . , ∞} such that for all F ∈ M it holds that F (0) = 0, f ( k ) := F ( k ) − F ( k − > k = 1 , , . . . and P k ∈K f ( k ) = 1. This setup includesnumerous models that have been used extensively in applied work both for dynamic andfor iid data, here we describe briefly two of them. Example 1 (Dynamic multinomial ordered choice model) . The discrete responses Y t are2ssumed to be generated by the rule Y t = V ∗ t ≤ τ τ < V ∗ t ≤ τ ... K if V ∗ t > τ K − , where V ∗ t is a continuous latent variable and τ , . . . , τ K − are threshold parameters thatdefine K intervals in R . In a simple model, e.g. Basu and de Jong (2007), the latentvariable is determined through the linear equation V ∗ t = X ′ t β + ρY t − + ε t , where X t is a vector of stationary exogenous regressors, β a vector of regression param-eters, ε t is the shock in each period, and Y t − could be replaced by any function of thepast { Y t − , . . . , Y t − n } for some finite n. The cdf of ε t , F ε , is going to determine the classof multinomial model, i.e. ordered multinomial probit (if ε t is standard normal) or logit(if ε t is logistic), since F t,θ is defined at once fromPr ( Y t = k | Ω t ) = Pr ( τ k − < V ∗ t ≤ τ k | Ω t )= F ε ( τ k − X ′ t β − ρY t − ) − F ε ( τ k − − X ′ t β − ρY t − ) , with τ = −∞ and τ K = ∞ and θ = ( β ′ , ρ, τ , . . . , τ K − ) ′ . Example 2 (Poisson Model) . The variate Y t = Y ∗ t + 1 is defined on the counts Y ∗ t =0 , , , . . . which are assumed to follow a conditional Poisson distribution Y ∗ t | Ω t ∼ Poisson( λ t ) , where the conditional mean can depend on covariates through an exponential link as λ t = exp( X ′ t β ) or on previous observations through an identity link as λ t = α + α λ t − + ρY ∗ t − , e.g. Fokianos et al. (2009), or through the logarithmic canonical link as log( λ t ) = X ′ t β + ρe t − , where e t = ( Y ∗ t − λ t ) /λ t are scaled and centered errors, e.g. Davis et al.(2003). 3espite that a correct specification is key to apply efficient maximum likelihood meth-ods, to obtain consistent estimates of partial effects and to get appropriate predictions ofthe probability of future events, empirical researchers typically do not perform goodnessof fit testing of such models as they would do in a continuous case. In general, thereare only a few specification tests available for discrete data, see Mora and Moro-Egido(2007). Two of them, the test of the Generalized Linear Model (GLM) of Stute and Zhu(2002) and the conditional Kolmogorov test of Andrews (1997), based on the specifica-tion of the conditional mean for binary data, can be adapted for this purpose and wediscuss this possibility and compare it to our approach in Section 6. A related test toAndrews derived for time series by Corradi and Swanson (2006) could be adapted alsofor discrete data, but this is testing a different null hypothesis concerning a distributiongiven a finite conditioning set not characterizing the complete dynamics of the process.There are also tests designed specifically for Poisson models (see e.g. Neumann 2011;Fokianos and Neumann, 2013).In what follows we propose conditional, dynamic discrete analogs of the Kolmogorov-Smirnov goodness of fit measure that can exploit different restrictions derived from themartingale difference property of a particular transformation of the data under the nullhypothesis. This property is derived from the specification of a complete dynamic modelgiven the information set generated by all the past observations of the discrete responseand other explanatory variables and is used to build the asymptotic theory for our tests.Under i.i.d. assumptions this martingale difference property leads to an exact indepen-dence of the transformation sequence under the null and a much simpler parallel asymp-totic theory.When the fitted distribution is continuous, the relative distribution of Y t comparedto F t,θ defined as the cdf of the Rosenblatt’s (1952) transforms, also called conditionalProbability Integral Transforms (PIT), U t ( θ ) := F t,θ ( Y t | Ω t ) , t = 1 , , . . . , T is standard uniform and U t ( θ ) are distributed as independent [0 ,
1] uniform random4ariables under H . This serves as a basis for several specification tests of H , see e.g.Bai (2003) and Kheifets (2015) for dynamic models and Delgado and Stute (2008) forindependent and identical distributed (iid) data. However Rosenblatt transformation isnot appropriate for discrete support random variables, producing non-iid pseudo residualseven under the null of correct specification. To solve the limitations of PIT-based testingtechniques for discrete data, several alternative transforms have been proposed, see Jung,Kukuk and Liesenfeld (2006), Czado, Gneiting and Held (2009) and references therein.An easy and popular way is to randomize, i.e. to interpolate the discrete values of Y t withindependent noise in [0 , nonrandomized transform Y t I t,θ ( u ) for u ∈ [0 , I t,θ ( u ) := , u ≤ U − t ( θ ) ; u − U − t ( θ ) U t ( θ ) − U − t ( θ ) , U − t ( θ ) ≤ u ≤ U t ( θ ) ;1 , U t ( θ ) ≤ u, (1)where U − t ( θ ) := F t,θ ( Y t − | Ω t ). This transform, conditional on data, is nonrandom-ized in the sense that it does not depend on extra sources of randomness, as opposed tointerpolation transforms discussed in the next section. The unconditional version of thistransform appears in Handcock and Morris (1999) and more recently in Czado, Gneitingand Held (2009) where it is used for calibration, but no formal tests are proposed there.This transformation can also be seem as a particular case of the multilinear extensionas defined in Genest, Neˇslehov´a and R´emillard (2014). As we show below, for every u ∈ [0 , I t,θ ( u ) − u constitute a martingale difference sequence (MDS) with respect toΩ t under H and can be used for testing H as I t,θ ( u ) loses this property when the modelis misspecified. For instance, we can compute the pseudo empirical relative distributionof Y t compared to F t,θ e F θ ( u ) := 1 T T X t =1 I t,θ ( u ) , u ∈ [0 , , S T ( u ) := 1 T / T X t =1 { I t,θ ( u ) − u } = T / (cid:16) e F θ ( u ) − u (cid:17) , which converges weakly to a Gaussian process. In addition, in order to control dynamicsin I t,θ ( u ), we can compare the joint pseudo empirical cdf with the uniform on a squareusing the biparameter process S T ( u ) := 1( T − / T X t =2 { I t,θ ( u ) I t − ,θ ( u ) − u u } , (2)where u = ( u , u ). To obtain feasible tests we need to consider norms of S jT for j = 1 , R S jT ( u ) dϕ ( u ) for some absolute continuous measure ϕ in [0 , j , or Kolmogorov-Smirnov sup u ∈ [0 , j | S jT ( u ) | norms.When the parameter θ is unknown under the null, we use an estimate b θ T and ac-count for the parameter estimation effect in the p -value computation with a parametricbootstrap method. It might be possible also to derive, e.g. martingale, distribution-freetransforms, but since they typically need to be programmed on a case by case basis foreach model, so can be impractical, and are beyond the scope of this paper. As far as weknow, our proposal is the first formal specification test of ordered discrete choice modelswhich accounts properly for parameter uncertainty and is based on a nonrandomizedtransform, which makes it attractive in terms of power against a wide set of alternativehypotheses.The rest of the paper is organized as follows. In the next section, we describe differentalternatives to the PIT. In Sections 3 and 4, we provide the main asymptotic propertiesof the nonrandomized transforms and of the resulting univariate and bivariate empiricalprocesses using martingale theory. In particular, we establish weak limits under fixedand local alternatives accounting for parameter estimation effect. Section 5 discussesthe implementation of new tests with a simple bootstrap algorithm. Section 6 provides asmall simulation exercise and an application exploring the properties of specification testsbased on both randomized and non randomized transformations. Then we conclude. Allproofs are contained in the Appendix. 6 ALTERNATIVES TO PIT FOR DISCRETE DATA
In order to further motivate the nonrandomized transform I t,θ defined in (1), we intro-duce the randomized PIT, U rt ( θ ) := U − t ( θ ) + Z Ut (cid:0) U t ( θ ) − U − t ( θ ) (cid:1) , (3)where { Z Ut } Tt =1 are independent standard uniform random variables, and independent of Y t . Alternatively, U rt can be obtained by applying the standard continuous PIT to the continuous random variable Y † t := Y t − Z t , where { Z t } Tt =1 are iid with any continuouscdf F Z on [0 , Y † t , F † t,θ ( y | Ω t ) = F t,θ ( ⌊ y ⌋ | Ω t ) + F Z ( y − ⌊ y ⌋ ) ( F t,θ ( ⌊ y + 1 ⌋ | Ω t ) − F t,θ ( ⌊ y ⌋ | Ω t )) , where ⌊ y ⌋ is the floor function, i.e. the maximum integer not exceeding y , and find that U rt ( θ ) = F † t,θ (cid:16) Y † t | Ω t (cid:17) , for Z Ut = F Z ( Z t ) and any choice of F Z , see Kheifets and Velasco (2013). Note that thecdf of Y † t conditional on Ω t and { Ω t , Z t − , Z t − , . . . , Z } coincide. Under H , U rt ( θ ) areiid U [0 ,
1] variables as under any continuous distribution specification, while U t ( θ ) and U − t ( θ ) are not independent nor U [0 , U rt ( θ ), estimated using the randomized transform Y t { U rt ( θ ) ≤ u } , b F rθ ( u ) := 1 T T X t =1 { U rt ( θ ) ≤ u } , u ∈ [0 , , can be compared to the uniform cdf. Kheifets and Velasco (2013) then test H usingempirical process based on the randomized transform R T ( u ) := T / n b F rθ ( u ) − u o = 1 T / T X t =1 [ { U rt ( θ ) ≤ u } − u ] , u ∈ [0 , . We can also consider reducing the dependence on a particular outcome of the noise Z Ut in (3) and in the randomized transform by taking averages over M replications of { Z Ut } Tt =1 , conditional on the original data, similar to “average-jittering” of Machado and7antos Silva (2005). Suppose that for each t we have M independent sequences of uniform U [0 ,
1] noises Z Ut,m , m = 1 , , . . . , M , which generate U rt,m ( θ ) according to (3). Definethe M-random transform Y t I t,θ ,M ( Y t , u ), I t,θ ,M ( Y t , u ) := 1 M M X m =1 (cid:8) U rt,m ( θ ) ≤ u (cid:9) , which takes values on the set { , /M, /M, . . . , } and has mean u under H . Then thecdf of U rt ( θ ) is estimated by b F rθ ,M ( u ) := 1 T T X t =1 I t,θ ,M ( Y t , u ) , u ∈ [0 , . Note that with M = 1 we are back to b F rθ ( u ), and therefore, we can generalize R T to R T,M ( u ) := T / n b F rθ ,M ( u ) − u o , u ∈ [0 , . In order to propose specification tests, following Handcock and Morris (1999), we de-fine the discrete relative distribution of Y t compared to F t,θ as the cdf of U rt ( θ ). Under H , the discrete relative distribution is the uniform U [0 , Y t comparedto F t,θ can be ordered in terms of efficiency in the following way: e F θ ( u ) (the mostefficient), b F rθ ,M ( u ) and b F rθ ( u ). This ordering is determined by the amount of noiseintroduced in the definitions of the transforms: i.e. in nonrandomized, M -randomizedand (1-)randomized transforms. The nonrandomized transform can be equivalently ob-tained by integrating out the extra noise in the randomized transform I t,θ ( Y t , u ) = R { U rt ( θ ) ≤ u } dF Z or by taking the number of replications M to infinity, thus com-pletely removing the noise from the estimate of the discrete relative distribution and otherfunctionals of the transforms. The efficiency of the nonrandomized transform translatesinto the increased power of the specification tests based on this transform, whose prop-erties we study next. 8 PROPERTIES OF EMPIRICAL PROCESSES BASEDON THE NONRANDOMIZED TRANSFORM
As shown in the next lemma, the building blocks of e F θ ( u ) , I t,θ ( u ) − u , constitute amartingale difference sequence (MDS) with respect to Ω t , and therefore e F θ ( u ) is anunbiased and consistent estimate of the uniform cdf under the null, a reasonable basisfor developing tests of H . Moreover, the MDS property will allow us to establish theasymptotic properties of our test without imposing any additional restrictions. Let for u, v ∈ [0 , γ t,θ ( u, v ) := ( F k − u ∨ v ) ( u ∧ v − F k − ) F k − F k − (cid:8) F − t,θ ( u | Ω t ) = F − t,θ ( v | Ω t ) (cid:9) , where k = k ( u ) = F − t,θ ( u | Ω t ), with F − t,θ ( u | Ω t ) := min { y : F t,θ ( y | Ω t ) ≥ u } beingthe conditional quantile function and F k := F t,θ ( k | Ω t ). Lemma 1.
Under H , I t,θ ( u ) − u is a martingale difference sequence with respect to Ω t ,i.e. E [ I t,θ ( u ) | Ω t ] = u, a.s., with conditional covariance E [ I t,θ ( u ) I t,θ ( v ) | Ω t ] = u ∧ v − uv − γ t,θ ( u, v ) , a.s. Note that I t,θ ( u ) are not necessarily independent across t despite the fact that bythe martingale difference property, I t,θ ( u ) and I t − j,θ ( v ) are serially uncorrelated forall j = 0 and all u, v ∈ [0 , , see the Appendix. On the other hand, the I t,θ ( u ) are(conditionally) heteroskedastic, therefore the variance of S T is model and parameterdependent, but its distribution can be simulated conditional on exogenous informationin Ω t . Let V T ( u, v ) := Cov [ S T ( u ) , S T ( v )], then since 0 ≤ γ t,θ ( u, v ) < a.s. , V T ( u, v ) = u ∧ v − uv − T T X t =1 E (cid:2) γ t,θ ( u, v ) (cid:3) ≤ u ∧ v − uv, S T are not larger than those of the randomizedtransformation-based process R T or its weak limit, the Brownian sheet, see Corollary 4in Kheifets and Velasco (2013).Due to Lemma 1, E h e F θ ( u ) i = u under H and the natural empirical process forperforming tests on H is then S T . This process, being based on a nonrandomizedtransform, does not involve the extra noise that appears in the randomized transformbased empirical process R T for testing U rt ∼ U [0 , R T,M based on the M -randomized transform. The nextlemma is the key to understand the improvement of the M -randomized over the ran-domized and of the nonrandomized, advocated in this paper, over the M -randomizedtransform approaches. Lemma 2.
Suppose that the uniform law of large numbers holds for b F rθ ,M ( u ) and e F θ ( u ) .Independently of whether H holds or not, b F rθ ,M ( u ) and e F θ ( u ) consistently and uni-formly in u estimate the relative distribution, i.e. the cdf of U rt ( θ ) . e F θ ( u ) is moreefficient, but the difference in efficiency goes to as M → ∞ . In particular, under H , E [ R T,M ( u ) R T,M ( v )] = 1 M E [ R T ( u ) R T ( v )] + (cid:18) − M (cid:19) E [ S T ( u ) S T ( v )] . From Lemma 2, it follows that S T has the smallest variance, the variance of R T,M is aweighted sum of those of S T and R T , see also Equation (5) in Machado and Santos Silva(2005). Other advantages of S T over R T,M , are 1) computational, as there is no needto simulate M paths of transformations and 2) theoretical, since the weak convergenceis easier to prove for processes which are piece-wise linear in parameters. Therefore weconcentrate on studying the properties of tests based on the nonrandomized transform,for which we introduce the following assumption. Assumption 1. F t,θ ( · | Ω t ) ( k ) ∈ M a.s. for all t . Moreover, there exists a finite func-tion γ ∞ ( u, v ), such that uniformly in ( u, v ) ∈ [0 , , T − P Tt =1 γ t,θ ( u, v ) → p γ ∞ ( u, v ).This assumption implicitly restrict dynamics such that a uniform law of large numbers(LLN) holds for the averaged conditional covariance function. In the case of stationary10nd ergodic data, γ ∞ ( u, v ) = E (cid:2) γ ,θ ( u, v ) (cid:3) . Sufficient conditions for the stationarityand ergodicity of dynamic multinomial ordered choice models are given in Basu and deJong (2007) and for autoregressive Poisson are given in Davis et al. (2003), Fokianoset al. (2009) and Doukhan et al. (2012). Then it is possible to show the uniformity ofthe convergence from a point-wise result, since the summands are continuous, piece-wisepolynomials in u and v . As an illustration, in Section 8.5 in Appendix we discuss theassumptions for the Poisson model.The next result describes the asymptotic distribution of S T under the null hypothesis.Let ⇒ denote weak convergence in ℓ ∞ [0 , V ∞ ( u, v ) := u ∧ v − uv − γ ∞ ( u, v ). Lemma 3.
Suppose Assumption 1 holds. Under H , S T ⇒ S ∞ , where S ∞ is a Gaussian process in [0 , with zero mean and covariance function V ∞ . The asymptotic distribution of S T is model and parameter dependent, and the practi-cal implementation of tests when θ is unknown is discussed in Section 3.2 after presentinga general class of local alternatives to the null of correct specification of the conditionaldistribution. We next discuss the asymptotic properties of the empirical process S T under a class ofalternative hypothesis, that will lead to consistency of the specification tests based on S T for a wide class of alternatives. We consider the following class of local alternativesto H , H T : Y t | Ω t ∼ G T,t,θ ( · | Ω t ) for some θ ∈ Θ , where G T,t,θ ( y | Ω t ) = (cid:18) − δT / (cid:19) F t,θ ( y | Ω t ) + δT / H t ( y | Ω t ) , < δ < T / and for all t , H t ( · | Ω t ) ∈ M . When δ = 0 then H T nests H . Following Kheifets and Velasco (2013), for any discrete distributions G and F in M ,with probability functions g and f , define d ( G, F, u ) = G (cid:0) F − ( u ) (cid:1) − F (cid:0) F − ( u ) (cid:1) − F ( F − ( u )) − uf ( F − ( u )) (cid:2) g (cid:0) F − ( u ) (cid:1) − f (cid:0) F − ( u ) (cid:1)(cid:3) . Note, that d ( G, F, u ) = E G [ I F ( Y, u )] − E F [ I F ( Y, u )] = E G [ I F ( Y, u )] − u and d ( G, F, u ) ≡ G ≡ F . Under any G t ( · | Ω t ) ∈ M ,1 T / E [ S T ( u )] = 1 T T X t =1 E [ d ( G t ( · | Ω t ) , F t,θ ( · | Ω t ) , u )] . The next assumption guarantees that a LLN can be applied to the empirical discrepancybetween H t and F t,θ . Assumption 2.
Under H T , there exists a finite function D ( u ), such that uniformly in u ∈ [0 , , T P Tt =1 d ( H t ( · | Ω t ) , F t,θ ( · | Ω t ) , u ) → p D ( u ).Then the following lemma shows that the departure of H in the direction of H T introduces a drift in the asymptotic distribution of S T that will render consistency ofhypothesis tests based on functionals of H T . Lemma 4.
Suppose Assumptions 1-2 hold. Under H T , S T ⇒ S ∞ + δD , where S ∞ is as in Lemma 3. In practice, tests based on S T are unfeasible since θ is unknown, and has to be estimatedby b θ T , say. We assume that we have available an estimate b θ T so that under H T T / (cid:16)b θ T − θ (cid:17) = O p (1) , b S T ( u ) := 1 T / T X t =1 n I t, b θ T ( u ) − u o . We next analyze the consequences of replacing θ by b θ T in b S T .Let k · k be Euclidean norm, i.e. for matrix A , k A k = p tr ( AA ′ ), where A ′ is atranspose of A . For ε > , B ( a, ε ) is an open ball in R m with the center at point a andradius ε . For a cdf F θ in M define ∇ ( F θ , u ) := ˙ F θ (cid:0) F − θ ( u ) (cid:1) − F θ (cid:0) F − θ ( u ) (cid:1) − uf θ (cid:0) F − θ ( u ) (cid:1) ˙ f θ (cid:0) F − θ ( u ) (cid:1) , where ˙ F θ := ( ∂/∂θ ) F θ and ˙ f θ := ( ∂/∂θ ) f θ . We need the following assumptions to analyzethe asymptotic properties of b S T . Assumption 3 (Parametric family) . (A) The parameter space Θ is a compact set ina finite-dimensional Euclidean space, θ ∈ Θ ⊂ R m .(B) There exists δ >
0, such that F t,θ ( · | Ω t ) ∈ M , for all t , Ω t , T and θ ∈ B ( θ , δ ).(C) F t,θ ( k | Ω t ) is differentiable with respect to θ ∈ B ( θ , δ ) and under H T max t E h max k sup θ ∈ B ( θ ,δ ) (cid:13)(cid:13)(cid:13) ˙ F t,θ ( k | Ω t ) (cid:13)(cid:13)(cid:13)i ≤ M F < ∞ . (D) Under H T , there exists a finite L ( u ) := plim T →∞ T − P Tt =1 ∇ ( F t,θ ( · | Ω t ) , u ).Conditions (A)-(C) about the parametric family of distribution are standard, see e.g.Bai (2003, Assumptions A1-A2). For dynamic ordered choice and Poisson models thedifferentiability of the conditional distribution with respect to the parameter is equivalentto the differentiability of the link function. Part (D) guarantees a nice limit behaviourof the average generalized derivative of I t,θ . Conditions for no effect of informationtruncation can be provided in a similar way to Bai (2003, Assumption A4).The following lemma provides an expansion of the empirical process with estimatedparameters as the sum of the process with known parameters and a random drift describ-ing parameter estimation. 13 emma 5. Suppose Assumptions 1-3 hold and T / (cid:16)b θ T − θ (cid:17) = O p (1) . Under H T , b S T ( u ) = S T ( u ) + T / (cid:16)b θ T − θ (cid:17) ′ T T X t =1 ∇ ( F t,θ ( · | Ω t ) , u ) + o p (1) , (4) uniformly in u . Then, continuous functionals of b S T no longer converge to those of S + δD under H T , but the estimation effect also has to be taken into account using the followingassumption. Let Z (Ψ) be a normal vector with zero mean and covariance matrix Ψ . Assumption 4 (Parameter estimation) . Under H T , the estimator b θ T admits the asymp-totic linear expansion T / (cid:16)b θ T − θ (cid:17) = δξ + 1 T / T X t =1 ℓ t ( Y t , Ω t ) + o p (1) , (5)where ξ is a m × ℓ t constitute a martingale difference sequencewith respect to Ω t , such that (A) E [ ℓ t ( Y t , Ω t ) | Ω t ] = 0 and T − P Tt =1 E (cid:2) ℓ t ( Y t , Ω t ) ℓ t ( Y t , Ω t ) ′ | Ω t (cid:3) p → Ψ . (B) Lindeberg condition T − P Tt =1 E (cid:2) k ℓ t ( Y t , Ω t ) k (cid:8) T − / k ℓ t ( Y t , Ω t ) k > ε (cid:9) | Ω t (cid:3) p → (C) There exists a finite function W ( u ), such that T − P Tt =1 E [ I t,θ ( u ) ℓ t ( Y t , Ω t ) | Ω t ] → p W ( u ) uniformly in u .In particular, under H , δξ = 0, the estimate b θ T is centered and T / (cid:16)b θ T − θ (cid:17) converges in distribution to Z (Ψ).Assumption 4(A) and 4(B) hold for the MLE of many popular discrete models, includ-ing dynamic probit and logit and general discrete choice models. As an example consider14stimates b θ T , which are asymptotically equivalent to the (conditional) maximum likeli-hood estimates, i.e., T / (cid:16)b θ T − θ (cid:17) = − B − T / T X t =1 s t ( Y t , Ω t ) + o p (1) , where s t ( k, Ω t ) := ˙ f t,θ ( k | Ω t ) /f t,θ ( k | Ω t ) is the score function and B is a symmetric m × m positive definite matrix given by the limit of the Hessian, B := plim T →∞ T T X t =1 K X k =1 s t ( k, Ω t ) ˙ f t,θ ( k | Ω t ) ′ . Under H T , E [ s t ( Y t , Ω t ) | Ω t ] = δT − / P Kk =1 s t ( k, Ω t ) h t ( k | Ω t ). Then equation (5)holds with ξ = − plim T →∞ B − T − P Tt =1 P Kk =1 s t ( k, Ω t ) h t ( k | Ω t ) and ℓ t ( Y t , Ω t ) = − B − s t ( Y t , Ω t ) + B − P Kk =1 s t ( k, Ω t ) h t ( k | Ω t ).We can derive the covariance matrix between the process S T ( u ) and T / (cid:16)b θ T − θ (cid:17) and obtain joint convergence results, so under H T (cid:16) S T , T / (cid:16)b θ T − θ (cid:17)(cid:17) ⇒ ( S ∞ + δD , Z (Ψ) + δξ ) , (6)where the covariance function between S ∞ and Z (Ψ) is W ( u ).We can state now the result on the asymptotic distribution of the empirical process b S T under local alternatives, whose drift is different with respect to the case without estimatedparameters. Theorem 1.
Suppose Assumptions 1-4 hold. Under H T , b S T ⇒ b S ∞ + δ { D + ξ ′ L } , where b S ∞ := S ∞ + Z (Ψ) ′ L is a Gaussian process with zero mean and variance function V ( u, v ) + L ( u ) ′ Ψ L ( v ) + W ( u ) ′ L ( v ) + W ( v ) ′ L ( u ) . EMPIRICAL PROCESSES FOR DYNAMIC SPEC-IFICATION
Test statistics based on S T , R T and R T,M verify that the conditional distribution of Y t is right on average across all possible Ω t , so these tests might not capture all sources ofmisspecification. This issue is raised in Corradi and Swanson (2006), Delgado and Stute(2008) and Kheifets (2015) in relation to testing continuous distributions. However, it isnot possible to develop specification tests conditioned on infinite dimensional values of Ω t .Instead of truncating Ω t or restricting the class of models, we consider S T , a biparameteranalog of S T to control the possible dynamic misspecification. From Lemma 1, sinceunder H , I t,θ ( u ) − u is a MDS, I t,θ ( u ) I t − ,θ ( u ) − u u is centered around zero,and moreover E [ I t,θ ( u ) I t − ,θ ( u ) | Ω t − ] = u u , a.s. This motivates us to develop tests based on S T defined in (2). This process also has zeromean under the null and identifies not only departures from the null derived from devia-tions of the unconditional expectation of I t,θ ( u ) from u, but also from a possible failureof the martingale property, so that I t,θ ( u ) and I t − ,θ ( u ) would become correlated.This idea is similar to that exploited in Kheifets’ (2015) in the context of conditionaldistribution testing for continuous distributions, where different methods of checking theindependence property of the PIT are proposed. Alternative statistics exploiting the lackof correlations with any other lag could be proposed, but we expect that low lags aretypically more useful for detecting general forms of misspecification.One could also consider a biparameter analog of R T,M , i.e. for some M = 1 , , . . . ,R T,M ( u ) := 1( T − / M T X t =2 M X m =1 (cid:0) (cid:8) U rt,m ( θ ) ≤ u (cid:9) (cid:8) U rt − ,m ( θ ) ≤ u (cid:9) − u u (cid:1) , where u = ( u , u ) ∈ [0 , . In particular, a bivariate analog of R T , R T ( u ) := R T, ( u ),is introduced in Kheifets and Velasco (2013). Tests based on R T and R T,M involverandomized transforms and therefore suffer from power loss compared to tests based onthe nonrandomized transform. 16ote, that S T ( u ) − u S T − ( u ) is a martingale. This observation will allow us toderive weak convergence of S T by employing limiting theorems for MDS. Properties of R T were established in Kheifets and Velasco (2013) and could be extended to R T,M .Here we discuss the properties of S T when we estimate θ . In practice we use the process b S T ( u ) := 1( T − / T X t =2 n I t, b θ T ( u ) I t, b θ T ( u ) − u u o , where we can write under H T b S T ( u ) = S T ( u ) + T / (cid:16)b θ T − θ (cid:17) ′ T T X t =2 ∇ ,t ( u ) + o p (1) , (7)uniformly in u , where ∇ ,t ( u ) := I t − ,θ ( u ) ∇ ( F t,θ ( · | Ω t ) , u )+ u ∇ ( F t − ,θ ( · | Ω t − ) , u ) and the asymptoticcovariance function is W ( u ) := ACov (cid:16) S T ( u ) , T / (cid:16)b θ T − θ (cid:17)(cid:17) . To study the asymp-totic properties of the biparameter process we introduce the next assumption, whichextends Assumption 2. Assumption 5.
Under H T , there exist finite functions D ( u ) and L ( u ), such thatuniformly in u (A) T − P Tt =2 { I t − ,θ ( u ) d ( H t ( · | Ω t ) , F t,θ ( · | Ω t ) , u )+ u d ( H t ( · | Ω t ) , F t,θ ( · | Ω t ) , u ) } → p D ( u ). (B) T − P Tt =2 ∇ ,t ( u ) → p L ( u ).Note that the second terms in the definitions of D and L correspond to u D ( u ) and u L ( u ) respectively, the equivalent for the single parameter process S T , but the firstones are new. To state the next result, we need to assume existence of probabilistic limitsof several random functions. For the sake of presentation, we defer precise statements tothe Appendix, see Assumption A. 17 heorem 2. Suppose that in addition to the conditions of Theorem 1, Assumption 5 andAssumption A from the Appendix hold. Under H T , S T ( u ) ⇒ S ∞ + δD , where S ∞ is a Gaussian process in [0 , with mean zero and covariance function V ∞ ( u, v ) defined in the Appendix. Under H T , if parameters are estimated, b S T ⇒ b S ∞ + δ { D + ξ ′ L } , where b S ∞ := S ∞ + Z (Ψ) ′ L is a Gaussian process with zero mean and variance function V ∞ ( u, v ) + L ( u ) ′ Ψ L ( v ) + W ( u ) ′ L ( v ) + W ( v ) ′ L ( u ) . When G t ( · | Ω t ) is different from F t,θ ( · | Ω t ) such that D is non-zero, the test basedon b S T has nontrivial power in the direction of H T . In contrast to the univariate casewith S T , the first term in the definition of D contains correlation with the past infor-mation and can therefore capture dynamic misspecification when this induces in such acorrelation, even if the unconditional expectation of d , which appears in the second term u D ( u ), is zero. This fact is crucial if misspecification occurs in the dynamics and notonly in the link function or other static aspects of the model. To test H we consider Cramer-von Mises, Kolmogorov-Smirnov or any other continuousfunctionals of b S jT , j = 1 , η (cid:16) b S jT (cid:17) . Then consistency properties of specification testsbased on b S jT can be derived using the discussion in the previous sections by applying thecontinuous mapping theorem, so we omit the proof of the following result. Theorem 3.
Suppose that conditions of Theorem 2 hold. Under H T , η (cid:16) b S jT (cid:17) → d η (cid:16) b S j ∞ (cid:17) , j = 1 , . Since the asymptotic distributions of S jT ( u ) are model dependent, and those of b S jT ( u )further depend on the estimation effect, we need to resort to bootstrap methods to18mplement our tests in practice. In the literature, there are several resampling methodssuitable for dependent data, but since under H the parametric conditional distribution isfully specified, we apply a conditional parametric bootstrap algorithm that only requiresto make draws from F t, b θ ( · | Ω t ) to mimic the null distribution of the test statistics. For adiscussion of the parametric bootstrap see Stute et al. (1993) and Andrews (1997), whichcan be adapted to the complications with information truncation and initialization arisingin the dynamic case using the discussion in Bai (2003).To estimate the true 1 − α quantiles c j ( θ ) of the null asymptotic distribution of thetest statistics, given by some continuous functional η applied to b S j ∞ with δ = 0, weimplement the following steps.1. Estimate the model with data ( Y t , X ′ t ), t = 1 , , ..., T , get parameter estimator b θ T and compute test statistics η ( b S jT ).2. Simulate Y ∗ t with F c θ T ( · | Ω ∗ t ) recursively for t = 1 , , ..., T , where the bootstrapinformation set is Ω ∗ t = (cid:0) X t , Y ∗ t − , X t − , Y ∗ t − , X t − , ... (cid:1) .3. Estimate the model with simulated data Y ∗ t , get b θ ∗ T using the same method as for b θ T , get bootstrapped test statistics η (cid:16) b S ∗ jT (cid:17) .4. Repeat 2-3 B times, compute the percentiles of the empirical distribution of the B bootstrapped test statistics.5. Reject H if η (cid:16) b S jT (cid:17) is greater than the (1 − α )th percentile of the empirical dis-tribution of the B bootstrapped test statistics denoted by b c ∗ jB (cid:16)b θ T (cid:17) .To analyze the properties of our parametric bootstrap, we need to assume that thesame conditions on the estimation method hold for both for original and resampled data.More formally, we have Assumption 6. (A)
The conditional distribution of Y t conditional on Ω t coincides withthe conditional distribution of Y t conditional on Ω t ∪ { X ′ k } Tk = t +1 .19 B) Suppose that the sample is generated by F θ T , for some nonrandom sequence θ T converging to θ , i.e. we have a triangular array of random variables { Y T t : t =1 , , . . . , T } with ( T, t ) element generated by F θ T ( · | Ω T t ), whereΩ
T t = { X t , Y T t − , X t − , Y T t − , X t − , . . . } . Then the estimator b θ T of θ T admits anasymptotic linear expansion as in Assumption 4. Moreover, assume that under thealternative H , there exists some θ ∈ Θ so that θ = plim T →∞ b θ T . This assumption insures that by simulating from the conditional distribution F θ T we obtain the correct joint distribution of S jT and T / (cid:16)b θ T − θ T (cid:17) in parallel to thoserequired in Theorems 1-2. Assumption 6 (A) says that Y t and future X t are independentconditionally on past information, i.e. that there is no direct feedback effect. For example,in a latent variable form of the ordered probit model, this assumption translates to strictexogeneity, i.e. that innovations are independent of future X t . Dependence between Y t and future X t is still allowed through serial dependence in X t and Y t . Assumption 6 (B)is similar to Condition (5.5) in Burke et al. (1979), Assumption (A1) in Stute et al. (1993)and Assumption E2 in Andrews (1997), and introduces a triangular array version of theexpansion and central limit theorem for parameter estimates, see also the discussion inSection 4.1 in Andrews (1997).We obtain the following result. Theorem 4.
Suppose that in addition to conditions of Theorem 2, Assumption 6 holds.Under H T , as B, T → ∞ ,η (cid:16) b S ∗ jT (cid:17) → d η (cid:16) b S j ∞ (cid:17) , j = 1 , , in probability, so b c ∗ jB (cid:16)b θ T (cid:17) → p c j ( θ ) , and therefore, under H , Pr (cid:16) η (cid:16) b S jT (cid:17) > b c ∗ jB (cid:16)b θ T (cid:17)(cid:17) → α . Suppose also that the conditions of Theorem 2 hold for any θ ∈ Θ . Under H , as B, T → ∞ , b c ∗ jB (cid:16)b θ T (cid:17) = O p (1) . This theorem shows that the bootstrap test statistic has the same limit distribution asthe original one under local alternatives, so that under the null we get the right asymptoticsize using bootstrap estimated critical values and that under local alternatives we get non20rivial power when the drifts of the stochastic processes b S T and b S T are non negligible.Similarly, under fixed alternatives we are able to get a bootstrap consistent test when theasymptotic test is consistent itself, i.e. lim T →∞ Pr (cid:16) η (cid:16) b S jT (cid:17) > b c ∗ jB (cid:16)b θ T (cid:17)(cid:17) = 1 if η (cid:16) b S jT (cid:17) diverges asymptotically. In this section we use a Monte Carlo simulation exercise to investigate the finite sampleproperties of the tests proposed in this paper. We take as reference the dynamic ordereddiscrete choice models investigated in Basu and de Jong (2007) for the modeling of themonetary policy conducted by the Federal Reserve (FED). The dependent variable usesthe following codification of the changes in the reference interest rate in US, the federalfunds rate i t , Y t = i t < − .
252 if − . ≤ ∆ i t <
03 if 0 ≤ ∆ i t < .
254 if ∆ i t ≥ . . Data is monthly and spans January 1990 to December 2006, leading to T = 204complete observations. The explanatory variables that Basu and de Jong (2007) used toexplain the decisions of the FED on ∆ i t are the current value and 4 lags of inflation (inf),the current value and a lag of four different measures of output gap ( out ) and a series ofdummies that describe the decision of the FED in the previous period, dum t = I (∆ i t − < , dum t = I (∆ i t − > , dum t = I (∆ i t − < − . , dum t = I (∆ i t − > . . Instead of these four dummies, we implement an AR(1), ’dynamic’ version with one lagof the discrete Y t as explanatory variable (and a version without lags that we refer to as’static’ to serve as a benchmark to the inclusion of lagged endogenous variables in Ω t ).We consider both the Logit and Probit versions of the models. We fit four versions ofthe basic model based on different definitions of the output gap and conditional on theseries of inflation and output gap and on the parameter estimates obtained, we simulate21able 1: Scenarios for Monte Carlo simulations.Scenario Null and AlternativeSize 1 H : static probitSize 2 H : static logitPower 1 H : static probit vs H : static logitPower 2 H : static probit vs H : dynamic probitPower 3 H : static probit vs H : dynamic logitseries Y t and conduct our tests on these (see Monte Carlo scenarios in Table 1).The four choices of output gap lead to Models I-IV. The output gap is the percentagedeviation of the actual from the potential output, which is interpolated to obtain aseries of monthly frequency by replicating the GDP observation for any quarter to allthe months in that quarter. Then two different measures of potential output are used:the potential output series provided by the Congressional Budget Office and a potentialoutput series constructed in a real-time setting using the HP filter, leading to Models Iand II. Apart from output gap, other measures of economic activity are used, such asunemployment rate and capacity utilization, leading to Models III and IV. Data sourcesare described in Basu and de Jong (2007).We compare the performance of our tests with an alternative test which is also om-nibus and does not require smoothing (and choice of smoothing parameters). Two generalapproaches can be adapted to our setup: the test of the Generalized Linear Model (GLM)of Stute and Zhu (2002) and the Conditional Kolmogorov test of Andrews (1997), asdiscussed in Mora and Moro-Egido (2007). The first one is a test based on a marked em-pirical process for testing the null H ′ : E h Y | e X = x i = m e β (cid:16) x ′ e β (cid:17) , where m e β ( · )is a parametric link function and e β , e β are finite dimensional parameters. In the caseswhere Y takes only two values { , } , the conditional mean coincides with the conditionalprobability of Y = 1 and the null is similar to our H if we were considering an i.i.d setup.22o test Y t | e X t ∼ P e β (cid:16) · | e X ′ t e β (cid:17) define the process Z T ( y ) := 1 T / T X t =1 n e X ′ t e β ≤ y o h Y t − P e β (cid:16) Y t = 1 | e X ′ t e β (cid:17)i , y ∈ R . The second test by Andrews is obtained by substituting n e X ′ t e β ≤ y o with n e X t ≤ e x o (where e x is a real vector of dimension of e X t ) in Z T , but since it always underperformsaccording to simulations of Mora and Moro-Egido (2007), it is not considered here. If Y takes values { , . . . , K } , Mora and Moro-Egido (2007) substitute testing H by K tests ofthe hypotheses Y jt | e X t ∼ P j, e β (cid:16) Y t | e X ′ t e β (cid:17) , with corresponding processes Z j,T , where Y jt = { Y t = j } and j = 1 , , . . . , K , then the resulting pooled test statistics are η CvMZ = T − K X j =1 T X t =1 Z j,T (cid:16) e X ′ t e β (cid:17) and η KSZ = T − max j =1 ,...,K T X t =1 Z j,T (cid:16) e X ′ t e β (cid:17) , which we call the CvM and KS tests respectively. To apply these tests to our model, let e X t = (cid:0) X ′ t , Y t − , (cid:1) ′ and e β = (cid:16) β ′ , ρ, − τ (cid:17) ′ and take the corresponding link functions.We analyze tests based on S T , R T,M , R T and S T , R T,M , R T and Z T . In all caseswe use Kolmogorov-Smirnov (KS) and Cramer-von Mises (CvM) measures. We onlyconsider feasible bootstrap versions of tests based on b S T , b R T,M , etc, where we replace θ by root- T consistent estimates b θ T , the ML estimator in our case. We are not awareof any theoretical results for bootstrap assisted tests based on b Z T in our setup, althoughMora and Moro-Egido (2007) provide some simulations.Parameter estimates for real data are reported in Tables 2 and 3. The main questionis whether the static Probit or Logit models are appropriate for changes in the interestrates, and we check this with our tests. The p -values in Tables 4 and 5 say that all thesemodels are rejected even at the 1% significance level by biparameter nonrandomizedtransform based tests. Note that single parameter static tests (e.g. b R T , b S T ) cannotreject any proposed model with the sole exception of b S T which rejects at 5% Model IIwith Cramer – von Misses test statistics. 23able 2: ML estimates and standard errors of Models I-IV with static and dynamicspecifications and Probit link function applied to the real US data, T = 204.I-static I-dynamic II-static II-dynamic III-static III-dynamic IV-static IV-dynamic τ − . − . − . − . − . − . − . − . .
51) (0 .
66) (0 .
35) (0 .
47) (0 .
36) (0 .
48) (0 .
37) (0 . τ − . − . − . − . − . − . − . − . .
47) (0 .
64) (0 .
31) (0 .
46) (0 .
32) (0 .
47) (0 .
32) (0 . τ − .
72 1 . − .
39 2 .
60 0 .
09 2 . − .
11 2 . .
40) (0 .
63) (0 .
26) (0 .
48) (0 .
28) (0 .
48) (0 .
27) (0 . inf − . − . − . − . − . − . − . − . .
68) (0 .
72) (0 .
67) (0 .
71) (0 .
69) (0 .
73) (0 .
69) (0 . inf − .
86 2 .
90 1 .
94 3 .
05 2 .
05 3 .
07 2 .
14 3 . .
99) (1 .
06) (0 .
98) (1 .
06) (1 .
00) (1 .
07) (1 .
01) (1 . inf − − . − . − . − . − . − . − . − . .
98) (1 .
07) (0 .
97) (1 .
06) (0 .
99) (1 .
07) (1 .
02) (1 . inf − .
39 2 .
44 1 .
60 2 .
74 1 .
79 2 .
79 1 .
27 2 . .
99) (1 .
06) (0 .
98) (1 .
06) (1 .
00) (1 .
08) (1 .
03) (1 . inf − . − . − . − . − . − .
85 0 . − . .
68) (0 .
73) (0 .
66) (0 .
71) (0 .
67) (0 .
73) (0 .
71) (0 . out − . − .
02 0 .
36 0 .
40 3 .
35 2 . − . − . .
30) (0 .
33) (0 .
59) (0 .
63) (0 .
68) (0 .
74) (0 .
22) (0 . out − .
81 0 .
90 0 .
84 0 .
65 2 .
48 0 . − . − . .
29) (0 .
32) (0 .
59) (0 .
64) (0 .
67) (0 .
73) (0 .
22) (0 . Y − — − .
08 — − .
12 — − .
03 — − . .
15) (0 .
15) (0 .
16) (0 . T = 204.I-static I-dynamic II-static II-dynamic III-static III-dynamic IV-static IV-dynamic τ − . − . − . − . − . − . − . − . .
98) (1 .
20) (0 .
68) (0 .
83) (0 .
69) (0 .
85) (0 .
72) (0 . τ − . − . − . − . − . − . − . − . .
90) (1 .
17) (0 .
60) (0 .
81) (0 .
59) (0 .
83) (0 .
61) (0 . τ − .
00 3 . − .
85 4 .
52 0 .
07 4 . − .
24 4 . .
72) (1 .
12) (0 .
47) (0 .
84) (0 .
49) (0 .
86) (0 .
49) (0 . inf − . − . − . − . − . − . − . − . .
21) (1 .
30) (1 .
21) (1 .
29) (1 .
21) (1 .
32) (1 .
22) (1 . inf − .
28 4 .
95 3 .
22 5 .
46 3 .
59 5 .
43 3 .
41 5 . .
78) (1 .
92) (1 .
77) (1 .
92) (1 .
76) (1 .
93) (1 .
82) (1 . inf − − . − . − . − . − . − . − . − . .
74) (1 .
95) (1 .
73) (1 .
94) (1 .
76) (1 .
95) (1 .
86) (1 . inf − .
42 4 .
36 2 .
61 5 .
20 2 .
94 5 .
11 1 .
65 4 . .
75) (1 .
92) (1 .
75) (1 .
93) (1 .
77) (1 .
95) (1 .
86) (1 . inf − . − . − . − .
88 0 . − .
54 2 . − . .
20) (1 .
32) (1 .
18) (1 .
28) (1 .
19) (1 .
30) (1 .
27) (1 . out − . − .
79 0 .
43 0 .
63 5 .
87 4 . − . − . .
54) (0 .
60) (1 .
04) (1 .
14) (1 .
24) (1 .
34) (0 .
40) (0 . out − .
43 1 .
59 1 .
61 1 .
29 4 .
21 1 . − . − . .
52) (0 .
59) (1 .
04) (1 .
15) (1 .
20) (1 .
33) (0 .
40) (0 . Y − — − .
98 — − .
04 — − .
86 — − . .
28) (0 .
27) (0 .
28) (0 . T = 204. b S T b R T, b R T, b R T b S T b R T, b R T, b R T b Z T H : static probitModel I 0 .
001 0 .
001 0 .
001 0 .
237 0 .
009 0 .
026 0 .
078 0 .
516 0 . .
001 0 .
001 0 .
001 0 .
166 0 .
077 0 .
057 0 .
229 0 .
167 0 . .
001 0 .
001 0 .
001 0 .
307 0 .
492 0 .
632 0 .
616 0 .
731 0 . .
001 0 .
002 0 .
002 0 .
496 0 .
721 0 .
509 0 .
582 0 .
668 0 . H : static logitModel I 0 .
001 0 .
001 0 .
001 0 .
152 0 .
021 0 .
079 0 .
221 0 .
793 0 . .
001 0 .
001 0 .
001 0 .
112 0 .
113 0 .
155 0 .
459 0 .
240 0 . .
001 0 .
001 0 .
001 0 .
360 0 .
314 0 .
493 0 .
541 0 .
745 0 . .
001 0 .
001 0 .
001 0 .
448 0 .
890 0 .
804 0 .
899 0 .
634 0 . Table 5: P-values of Kolmogorov – Smirnov tests for static Probit and Logit link functionapplied to the real US data, T = 204. b S T b R T, b R T, b R T b S T b R T, b R T, b R T b Z T H : static probitModel I 0 .
003 0 .
002 0 .
002 0 .
082 0 .
047 0 .
193 0 .
372 0 .
354 0 . .
001 0 .
001 0 .
002 0 .
586 0 .
351 0 .
426 0 .
626 0 .
450 0 . .
001 0 .
001 0 .
001 0 .
155 0 .
454 0 .
435 0 .
244 0 .
742 0 . .
001 0 .
002 0 .
002 0 .
799 0 .
936 0 .
913 0 .
801 0 .
355 0 . H : static logitModel I 0 .
001 0 .
001 0 .
001 0 .
133 0 .
010 0 .
050 0 .
212 0 .
684 0 . .
001 0 .
001 0 .
001 0 .
354 0 .
114 0 .
201 0 .
319 0 .
416 0 . .
001 0 .
001 0 .
001 0 .
149 0 .
511 0 .
472 0 .
350 0 .
642 0 . .
002 0 .
002 0 .
001 0 .
769 0 .
975 0 .
968 0 .
867 0 .
411 0 . T = 100. b S T b R T, b R T, b R T b S T b R T, b R T, b R T b Z T Size 1 H : static probitModel I 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H : static logitModel I 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H : static probit vs H : static logitModel I 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H : static probit vs H : dynamic probitModel I 89 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H : static probit vs H : dynamic logitModel I 90 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 7: Simulated size/power rates for the nominal 5% level of Kolmogorov – Smirnovtests of Models I-IV with static and dynamic specifications applied to simulated data, T = 100. b S T b R T, b R T, b R T b S T b R T, b R T, b R T b Z T Size 1 H : static probitModel I 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H : static logitModel I 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H : static probit vs H : static logitModel I 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H : static probit vs H : dynamic probitModel I 82 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H : static probit vs H : dynamic logitModel I 86 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . T = 200. b S T b R T, b R T, b R T b S T b R T, b R T, b R T b Z T Size 1 H : static probitModel I 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H : static logitModel I 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H : static probit vs H : static logitModel I 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H : static probit vs H : dynamic probitModel I 98 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H : static probit vs H : dynamic logitModel I 98 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 9: Simulated size/power rates for the nominal 5% level of Kolmogorov – Smirnovtests of Models I-IV with static and dynamic specifications applied to simulated data, T = 200. b S T b R T, b R T, b R T b S T b R T, b R T, b R T b Z T Size 1 H : static probitModel I 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H : static logitModel I 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H : static probit vs H : static logitModel I 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H : static probit vs H : dynamic probitModel I 94 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H : static probit vs H : dynamic logitModel I 97 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28o study the reliability of these results we conduct a Monte Carlo experiment usingthe estimated models with the real data as data generating processes and obtain thesimulations for the discrete response conditional on the covariates time series. In Tables 6and 7 we provide the empirical size and power results of our tests across simulations forsample size T = 100 and static Probit and Logit and output gap choices (Models Ito IV). To speed up the simulation procedure, we use the warp bootstrap algorithm ofGiacomini, Politis and White (2013). We see that all bootstrap tests provide reasonablesize accuracy, tests based on single parameter empirical processes underrejecting slightly,while ones based on bivariate processes tend to overreject moderately. Kolmogorov-Smirnov and Cramer-von Mises tests perform similarly in all cases, and the choice of theoutput gap series does not make large differences either, nor does the introduction oflagged endogenous (discrete) variables in the information set.The power of the tests for the static Probit model is analyzed against three differentalternatives: static Logit, dynamic Probit and dynamic Logit. We see that the testswithout randomization, b S T and b S T always perform better than random continuousprocesses b R T,M and b R T,M , which in turn dominate b R T and b R T , thus confirming ourtheoretical findings. When we compare Probit and Logit specifications while letting thedynamic aspect of the model be well specified, static in both cases, we observe that withthis sample size and these specifications, it is almost impossible to distinguish Probit fromLogit models. The power against a dynamic Probit and Logit alternatives is very high.Since the nature of misspecification is dynamic, once again bivariate processes shouldhave more power compared to single parameter counterparts, as it is confirmed in oursimulation results. It can also be observed that for these alternatives, the Cramer-vonMises criterium provides more power than Kolmogorov-Smirnov tests. As for alternativetests based on b Z T , they have power comparable to b S T , sometimes slightly better, andare always outperformed by any bivariate test. This is not surprising, since b Z T has morestructure, i.e. it assumes a single-index model for covariates, but averages across points,thus suffering the same problems as other single parameter tests considered here.In Tables 8 and 9 we provide the empirical size and power results of our tests for the29arger sample size T = 200. Here the size properties are similar, while power rejectionsrates are noticeably higher for the dynamic alternatives. In this paper we have proposed new specification tests for the conditional distribution ofdiscrete data with possibly infinite support. The new tests are functionals of empiricalprocesses based on a nonrandomized transform that solves the implementation problemof the usual PIT for discrete distributions and achieves consistency against a wide classof alternatives. We show the validity of a bootstrap algorithm for approximating thenull distribution of the test statistics, which are model and parameter dependent. Inour simulation study, we show that our method compares favorably in many relevantsituations with other methods available in the literature and have illustrated the newmethod in a small application.
In this section we derive the basic properties of the nonrandomized transform, whichare required prior to proving the weak convergence results for our empirical process.Without loss of generality and in order to make the exposition more transparent, weomit subscripts t, θ and conditioning set Ω t , and use shortcuts I F ( Y, u ) = I t,θ ( Y t , u )and I F,M ( Y, u ) = I t,θ ,M ( Y t , u ).For F ∈ M , F ( F − ( u )) ≥ u > F ( F − ( u ) −
1) and equality holds iff u = F ( k )for some integer k . For a random variable Y ∼ G ∈ M we find Pr G ( F ( Y ) < u ) = G ( F − ( u ) −
1) and g ( F − ( u )) := Pr G ( Y = F − ( u )) = G ( F − ( u )) − G ( F − ( u ) − G = F , we have that Pr F ( F ( Y ) < u ) = F ( F − ( u ) − < u , i.e. F ( Y ) is notuniform and the expectation of the indicator function I ( F ( Y ) < u ) is never u as it is for30ontinuous F .The nonrandomized transform can be written as I F ( Y, u ) = (1 − δ F ( u )) (cid:8) Y = F − ( u ) (cid:9) + (cid:8) Y < F − ( u ) (cid:9) , where δ F ( u ) := F ( F − ( u )) − uf ( F − ( u )) . Note that δ F ( u ) ∈ [0 , I F ( Y, u ) is a piecewise linear (continuous) functionincreasing in u . Let δ F ( u, v ) := ( δ F ( u ∨ v ) − δ F ( u ) δ F ( v )) f (cid:0) F − ( u ∧ v ) (cid:1) × (cid:8) F − ( u ) = F − ( v ) (cid:9) ∈ (cid:2) , u ∧ v ∧ f (cid:0) F − ( u ∧ v ) (cid:1)(cid:3) ,d ( G, F, u, v ) := d ( G, F, u ∧ v ) − ( δ F ( u ∨ v ) − δ F ( u ) δ F ( v )) (cid:8) F − ( u ) = F − ( v ) (cid:9) × (cid:0) g (cid:0) F − ( u ) (cid:1) − f (cid:0) F − ( u ) (cid:1)(cid:1) , In Table 10 and Lemma A we list the properties of this transform.
Lemma A.
For ≤ v ≤ u ≤ and F, G, H ∈ M , (i) E G [ I F ( Y, u )] = u + d ( G, F, u ) , where E G [ · ] = R ( · ) dG and d ( G, F, u ) ∈ [ − u, − u ] .When G = F , the expectation is u .(ii) I F ( Y, u ) I F ( Y, v ) = I F ( Y, u ∧ v ) − ( δ F ( u ∨ v ) − δ F ( u ) δ F ( v )) × { Y = F − ( u ) = F − ( v ) } . (iii) E G [ I F ( Y, u ) I F ( Y, v )] = u ∧ v − δ F ( u, v ) + d ( G, F, u, v ) .(iv) | I F ( Y, u ) − I H ( Y, u ) | ≤ ∧ | F ( Y ) − H ( Y ) |∨| F ( Y − − H ( Y − | f ( Y ) ∨ h ( Y ) Moreover, E F (cid:2) | I F ( Y, u ) − I H ( Y, u ) | (cid:3) ≤ k | F ( k ) − H ( k ) | .(v) | I F ( Y, u ) − u − I F ( Y, v ) + v | ≤ | u − v |∨ (1 − f ( Y )) and | I F ( Y, u ) − u − I F ( Y, v ) + v | = | u − v | if u, v ≤ F ( Y − or u, v ≥ F ( Y ) . Moreover, E F (cid:2) sup u,v ∈ Ψ( ε ) | I F ( Y, u ) − u − I F ( Y, v ) + v | (cid:3) ≤ ε , for any interval Ψ( ε ) ⊂ [0 , of length ε . I ( · , · ) for all possiblevalues of Y relative to inverted cdfs at points u and v . For instance, I F ( Y, u ) − I F ( Y, v ) =0 if
Y < F − ( u ) and Y < F − ( v ), while I F ( Y, u ) − I F ( Y, v ) = − δ F ( u ) if Y = F − ( u )
1) 0 { − δ F ( u ) ≤ v } v = 1 1 1 1The value of I F ( Y, u ) − I F ( Y, v ) Y < F − ( v ) 0 − δ F ( u ) − Y = F − ( v ) δ F ( v ) δ F ( v ) − δ F ( u ) − δ F ( v ) Y > F − ( v ) 1 1 − δ F ( u ) 0The value of I F ( Y, u ) I F ( Y, v ) Y < F − ( v ) 1 1 − δ F ( u ) 0 Y = F − ( v ) 1 − δ F ( v ) (1 − δ F ( u )) (1 − δ F ( v )) 0 Y > F − ( v ) 0 0 0The value of I F ( Y, u ) − I H ( Y, u ) Y < H − ( u ) 0 − δ F ( u ) − Y = H − ( u ) δ H ( u ) δ H ( u ) − δ F ( u ) − δ H ( u ) Y > H − ( u ) 1 1 − δ F ( u ) 0The value of I F ( Y, u ) I H ( Y, u ) Y < H − ( u ) 1 1 − δ F ( u ) 0 Y = H − ( u ) 1 − δ H ( u ) (1 − δ F ( u )) (1 − δ H ( u )) 0 Y > H − ( u ) 0 0 032 vi) E F z (cid:2) (cid:8) F † (cid:0) Y † (cid:1) < u (cid:9)(cid:3) = I F ( Y, u ) .(vii) E F z [ I F,M ( Y, u ) I F,M ( Y, v )] = M I F ( Y, u ∧ v ) + (cid:0) − M (cid:1) I F ( Y, u ) I F ( Y, v ) . In this section we present Lindeberg-Feller-type sufficient conditions for functional weakconvergence of discrete martingales. In general, to establish the weak convergence oneneeds to check tightness and finite-dimensional convergence. In case of martingales, bothparts can be verified without imposing restrictive conditions. Here we state a result ofNishiyama (2000), which extends Theorem 2.11.9 of van der Vaart and Wellner (1996)to martingales, see also Theorem A.1 in Delgado and Escanciano (2007). Further detailson notation and definitions can be found in books Van der Vaart and Wellner (1996) forempirical processes and row-independent triangular arrays and in Jacod and Shiryaev(2003) for finite-dimensional semimartingales. For every T , let (cid:0) Ω T , F T , {F Tt } , P T (cid:1) bea discrete stochastic basis, where (cid:0) Ω T , F T , P T (cid:1) is a probability space equipped with afiltration (cid:8) F Tt (cid:9) . For nonempty set Ψ, let { ξ Tt } t =1 , ,... be a ℓ ∞ (Ψ)-valued martingaledifference array with respect to filtration F Tt , i.e. for every t , ξ Tt maps Ω T to ℓ ∞ (Ψ),the space of bounded, R -valued functions on Ψ with sup-norm k · k = k · k ∞ and foreach u ∈ Ψ, ξ Tt ( u ) is a R -valued martingale difference array: ξ Tt ( u ) is F Tt -measurableand E (cid:2) ξ Tt ( u ) | F Tt (cid:3) = 0. We are interested in studying the weak convergence of discretemartingales P Tt =1 ξ Tt . Denote a decreasing series of finite partitions (DFP) of Ψ as Π = { Π( ε ) } ε ∈ (0 , ∩ Q , where Π( ε ) = { Ψ( ε ; k ) } ≤ k ≤ N Π ( ε ) such that Ψ = S N Π ( ε ) k =1 Ψ( ε ; k ), N Π (1) =1 and lim ε → N Π ( ε ) = ∞ monotonically in ε . The ε -entropy of the DFP Π is H Π ( ε ) = p log N Π ( ε ). The quadratic Π-modulus of ξ Tt is R + ∪{∞} -valued process (cid:13)(cid:13) ξ Tt (cid:13)(cid:13) Π ,T = sup ε ∈ (0 , ∩ Q ε max ≤ k ≤ N Π ( ε ) vuut T X t =1 E " sup u,v ∈ Ψ( ε ; k ) (cid:12)(cid:12) ξ Tt ( u ) − ξ Tt ( v ) (cid:12)(cid:12) | F Tt . (8) Theorem A.
Let { ξ Tt } t =1 , ,... be a ℓ ∞ (Ψ) -valued martingale difference array andN1) (conditional variance convergence) P Tt =1 E (cid:2) ξ Tt ( u ) ξ Tt ( v ) | F Tt (cid:3) → P T V ( u, v ) for every u, v ∈ Ψ; 33
2) (Lindeberg condition) P Tt =1 E h(cid:13)(cid:13) ξ Tt (cid:13)(cid:13) (cid:8)(cid:13)(cid:13) ξ Tt (cid:13)(cid:13) > ε (cid:9) | F Tt i → P T for every ε > ;N3) (partitioning entropy condition) there exist a DFP Π of Ψ such that (cid:13)(cid:13) ξ Tt (cid:13)(cid:13) Π ,T = O P T (1) and R H Π ( ε ) dε < ∞ .Then P Tt =1 ξ Tt ⇒ S , where S has normal marginals ( S ( v ) , S ( v ) , . . . , S ( v d ))) ∼ d N (0 , Σ) with covariance Σ = { V ( v i , v j ) } ij . To establish the asymptotic properties of the biparameter process S T we need the fol-lowing assumption for uniform convergence of different empirical quantities. Assumption A.
Under H T , the following uniform limits to continuous functions exist1. plim T →∞ T P Tt =2 γ t − ,θ ( u , v ) γ t,θ ( u , v ),2. plim T →∞ T P Tt =2 I t − ,θ ( v ) γ t,θ ( u , v ),3. plim T →∞ T P Tt =2 I t − ,θ ( u ) d ( H t ( · | Ω t ) , F t,θ ( · | Ω t ) , u ),4. plim T →∞ T P Tt =2 I t − ,θ ( u ) E [ I t,θ ( u ) ℓ t ( Y t , Ω t ) | Ω t ],5. plim T →∞ T P Tt =2 I t − ,θ ( u ) ∇ ( F t,θ ( · | Ω t ) , u ).As it is discussed in the text, these conditions restrict the dynamics of the data processsuch that some LLN holds, which is the case, e.g., for stationary and ergodic processes. Proof of Lemma A . (i) By definition of I F ( Y, u ), E G [ I F ( Y, u )] = (1 − δ F ( u )) g ( F − ( u ))+ G ( F − ( u )) − g ( F − ( u )) = d ( G, F, u ) − δ F ( u ) f ( F − ( u )) + F ( F − ( u )) = d ( G, F, u ) + u .Similarly, by direct calculation we obtain (ii), (iii), (vi) and (vii). We now provide adetailed proof of (iv) and (v). 34 iv) We prove a stronger result that for G ∈ M , such that sup k | F ( k ) − G ( k ) | ∨| H ( k ) − G ( k ) | ≤ sup k | F ( k ) − H ( k ) | the expectation with respect to G is bounded:E G (cid:2) ( I F ( Y, u ) − I H ( Y, u )) (cid:3) ≤ k | F ( k ) − H ( k ) | . Then, the required bound is ob-tained by setting G ≡ F .Since | I F ( Y, u ) − I H ( Y, u ) | never exceeds 1, we have that E G (cid:2) ( I F ( Y, u ) − I H ( Y, u )) (cid:3) ≤ E G [ | I F ( Y, u ) − I H ( Y, u ) | ], therefore we bound the latter expectation.Suppose that F − ( u ) = H − ( u ). Then I F ( Y, u ) − I H ( Y, u ) = δ H ( u ) − δ F ( u ) for Y = F − ( u ), i.e. with probability g ( F − ( u )), and is zero for other Y . Therefore,E G [ | I F ( Y, u ) − I H ( Y, u ) | ] = | δ H ( u ) − δ F ( u ) | g (cid:0) F − ( u ) (cid:1) ≤ (cid:12)(cid:12) F (cid:0) F − ( u ) (cid:1) − H (cid:0) F − ( u ) (cid:1)(cid:12)(cid:12) + (cid:12)(cid:12) f (cid:0) F − ( u ) (cid:1) − g (cid:0) F − ( u ) (cid:1)(cid:12)(cid:12) δ F ( u ) + (cid:12)(cid:12) h (cid:0) F − ( u ) (cid:1) − g (cid:0) F − ( u ) (cid:1)(cid:12)(cid:12) δ H ( u ) ≤ sup k | F ( k ) − H ( k ) | + sup k | f ( k ) − g ( k ) | + sup k | h ( k ) − g ( k ) |≤ k | F ( k ) − H ( k ) | , since δ F ( u ) , δ H ( u ) ∈ [0 ,
1) and sup k | h ( k ) − g ( k ) | ≤ k | F ( k ) − G ( k ) | .Suppose that F − ( u ) < H − ( u ). Note that I F ( Y, u ) − I H ( Y, u ) = 0 for Y [ F − ( u ) , H − ( u )]. We separately bound each term inE G [ | I F ( Y, u ) − I H ( Y, u ) | ] = E G (cid:2) | I F ( Y, u ) − I H ( Y, u ) | (cid:8) Y = F − ( u ) (cid:9)(cid:3) + E G (cid:2) | I F ( Y, u ) − I H ( Y, u ) | (cid:8) Y = H − ( u ) (cid:9)(cid:3) + E G (cid:2) | I F ( Y, u ) − I H ( Y, u ) | (cid:8) F − ( u ) < Y < H − ( u ) (cid:9)(cid:3) . For Y = F − ( u ), I F ( Y, u ) − I H ( Y, u ) = − δ F ( u ). ThenE G (cid:2) ( I F ( Y, u ) − I H ( Y, u )) (cid:8) Y = F − ( u ) (cid:9)(cid:3) = (cid:12)(cid:12) I F (cid:0) F − ( u ) , u (cid:1) − I H (cid:0) F − ( u ) , u (cid:1)(cid:12)(cid:12) g ( F − ( u ))= δ F ( u ) g (cid:0) F − ( u ) (cid:1) = F (cid:0) F − ( u ) (cid:1) − u + δ F ( u ) (cid:0) g (cid:0) F − ( u ) (cid:1) − f (cid:0) F − ( u ) (cid:1)(cid:1) ≤ sup k | F ( k ) − H ( k ) | + sup k | f ( k ) − g ( k ) | ≤ k | F ( k ) − H ( k ) | , since δ F ( u ) ∈ [0 ,
1) and for u ∈ [ H ( F − ( u )) , F ( F − ( u ))] we have that F ( F − ( u )) − u ≤ F ( F − ( u )) − H ( F − ( u )). 35or Y = H − ( u ), I F ( Y, u ) − I H ( Y, u ) = − δ H ( u ).Then E G (cid:2) | I F ( Y, u ) − I H ( Y, u ) | (cid:8) Y = H − ( u ) (cid:9)(cid:3) = (cid:12)(cid:12) I F (cid:0) H − ( u ) , u (cid:1) − I H (cid:0) H − ( u ) , u (cid:1)(cid:12)(cid:12) g ( H − ( u ))= (1 − δ H ( u )) g (cid:0) H − ( u ) (cid:1) = u − H (cid:0) H − ( u ) − (cid:1) + (1 − δ H ( u )) (cid:0) g (cid:0) H − ( u ) (cid:1) − h (cid:0) H − ( u ) (cid:1)(cid:1) ≤ sup k | F ( k ) − H ( k ) | + sup k | h ( k ) − g ( k ) | ≤ k | F ( k ) − H ( k ) | , since δ H ( u ) ∈ [0 ,
1) and for u ∈ [ H ( H − ( u ) − , F ( H − ( u ) − u − H ( H − ( u ) − ≤ F ( H − ( u ) − − H ( H − ( u ) − F − ( u ) < Y < H − ( u ), I F ( Y, u ) − I H ( Y, u ) = −
1. Then E F | I F ( Y, u ) − I H ( Y, u ) | (cid:8) F − ( u ) < Y < H − ( u ) (cid:9) = H − ( u ) − X k = F − ( u )+1 g ( k ) = G ( H − ( u ) − − G ( F − ( u )) ≤ F ( H − ( u ) − − F ( F − ( u )) + 2 sup k | G ( k ) − F ( k ) |≤ F ( H − ( u ) − − H ( H − ( u ) −
1) + 2 sup k | G ( k ) − F ( k ) | ≤ k | F ( k ) − H ( k ) | since H ( H − ( u ) − < u < F ( F − ( u )) < F ( H − ( u ) − G (cid:2) | I F ( Y, u ) − I H ( Y, u ) | (cid:3) ≤ k | F ( k ) − H ( k ) | for F − ( u ) < H − ( u ). This equation is symmetric with respect to F and H ; therefore, itholds also for F − ( u ) > H − ( u ). (v) Let [ a, b ] denote the interval Ψ( ε ) of length ε , sup ξ denote the supremum of ξ over u, v ∈ [ a, b ], where ξ := I F ( Y, u ) − u − I F ( Y, v ) + v .Note that | ξ | ≤
1; moreover, if [ F ( Y − , F ( Y )] ∩ [ a, b ] = ∅ , then sup | ξ | = ε and if[ a, b ] ⊂ [ F ( Y − , F ( Y )], then sup | ξ | = − f ( Y ) f ( Y ) ε .Suppose that F − ( a ) = F − ( b ), i.e. [ a, b ] ⊂ [ F ( F − ( a ) − , F ( F − ( a ))]. ThenE F (cid:2) sup ξ (cid:3) ≤ E F [sup | ξ | ] = ε P k = F − ( a ) f ( k ) + (cid:16) − f ( F − ( a )) f ( F − ( a )) (cid:17) ε f ( F − ( a )) = 2(1 − f ( F − ( a ))) ε ≤ ε . 36uppose that F − ( a ) < F − ( b ), i.e. [ a, b ] contains at least one point F ( k ) or evenintervals [ F ( k − , F ( k )] ⊂ [ a, b ]. On such intervals, | ξ | goes up to 1 − f ( k ), but theprobability of Y taking all such k is bounded by b − a . More precisely,E F (cid:2) sup ξ (cid:3) ≤ E F [sup | ξ | ]= ε X k
1) and under H , E [ { I t,θ ( u ) ≤ u } | Ω t ] = 1 − F θ (cid:0) F − θ ( u | Ω t ) | Ω t (cid:1) + n − δ F θ ( ·| Ω t ) ( u ) ≤ u o f θ ( · | Ω t ) (cid:0) F − θ ( u | Ω t ) (cid:1) , which depends on Ω t , and therefore E ( { I t,θ ( u ) ≤ u } | Ω t ) = E ( { I t,θ ( u ) ≤ u } ) withpositive probability, and independence does not follow in general.37 roof of Lemma 2. Because U rt ( θ ) are continuous, b F rθ ( u ) is a (uniform) consistentestimate of cdf of U rt ( θ ). Then by Lemma A(vi) and A(vii) and ULLN we get the uniformconsistency of b F rθ ,M ( u ) and e F rθ ( u ). The efficiency gain comes from Lemma A(ii). Proof of Lemma 3.
We need to verify conditions N1-N3 of Theorem A. Fix ε > ,
1] with usual norm and equidistant partition 0 = u < u < . . . < u N Π ( ε ) =1, i.e. partition of [0 ,
1] in N Π ( ε ) = [ ε − ] + 1 equal intervals of length ε (the lastinterval maybe even smaller), Ψ( ε ; k ) = [ u k − , u k ] and ξ Tt = ( I F ( Y t , u ) − u ) / √ T , whichis a square integrable martingale difference by Lemma 1. Then Condition N1 followsfrom Lemma 1 and Assumption 1. Condition N2 is satisfied because for T > ε − ],the indicator 1 n sup u ∈ [0 , | I F ( Y t , u ) − u | / √ T > ε o = 0. Condition N3 follows from thebound in Lemma A(v). Indeed, R H Π ( ε ) dε < ∞ and (cid:13)(cid:13) ξ Tt (cid:13)(cid:13) Π ,k ≤ sup ε ∈ (0 , ∩ Q ε max ≤ k ≤ N Π ( ε ) √ ε ≤ Proof of Lemma 4.
Apply weak convergence result from Lemma 3 under G T,θ ( · | Ω t )with ξ Tt := (cid:16) I F θ ( ·| Ω t ) ( Y t , u ) − u − d ( G T,θ ( · | Ω t ) , F θ ( · | Ω t ) , u ) (cid:17) / √ T , which is asquare integrable martingale difference because of Lemma A(i) with G = G T,θ ( · | Ω t )and F = F θ ( · | Ω t ). Then Condition N1 follows from Lemma A(iii) and the fact that d ( G, F, u, v ) are bounded in absolute value by T − / a.s. Condition N2 is satisfied becausefor T > ε − ], the indicator is 0. Condition N3 follows from the bound in Lemma A(v)and the fact that (E G [ · ] − E F [ · ]) applied to a.s. bounded r.v. are bounded in absolutevalue by T − / a.s. We obtain that P Tt =1 ξ Tt ⇒ S , the same limit as in Lemma 3. Finally,use additivity of d ( · , · , · ) in the first argument and apply ULLN to S T − P Tt =1 ξ Tt = P Tt =1 d ( G T,θ ( · | Ω t ) , F θ ( · | Ω t ) , u ) / √ T = δ P Tt =1 d ( H ( · | Ω t ) , F θ ( · | Ω t ) , u ) /T . Proof of Lemma 5.
Under H T , i.e. under G T,θ , Equation (4) can be establishedusing standard methods, applying Doob and Rosenthal inequalities for MDS (Hall andHeyde, 1980) √ T ξ Tt := I F b θT ( ·| Ω t ) ( Y t , u ) − I F θ ( ·| Ω t ) ( Y t , u ) − d (cid:16) G T,θ ( · | Ω t ) , F b θ T ( · | Ω t ) , u (cid:17) d ( G T,θ ( · | Ω t ) , F θ ( · | Ω t ) , u ) . Define z T := P Tt =1 ξ Tt . When it is necessary, we willwrite explicitly arguments: z T ( u, b θ T ). We show that sup u | z T | = o p (1). Since √ T (cid:16)b θ T − θ (cid:17) = O P (1), it is sufficient to establish that for some γ < / u, k η − θ k≤ T − γ | z T ( u, η ) | = o p (1) . Note that for
T > δ /ν , by Assumption 3C,Pr (cid:18) sup η,t max y | G T,t,θ ( y | Ω t ) − F t,η ( y | Ω t ) | > ν (cid:19) ≤ M F T − γ /ν . (9)First, we will show that ∀ η, u | z T | = o p (1). Since ξ Tt are bounded by 2 in absolutevalue and form a martingale difference sequence with respect to Ω t , by the Doob inequality ∀ p ≥ ∀ ε > P (cid:18) max t =1 ,...,T | z t | > ε (cid:19) ≤ E | z T | p /ε p , and by Rosenthal inequality, ∀ p ≥ ∃ C E | z T | p ≤ C (cid:20) E nX E (cid:16)(cid:0) ξ Tt (cid:1) | Ω t (cid:17)o p/ + X E (cid:12)(cid:12) ξ Tt (cid:12)(cid:12) p (cid:21) . Take p = 4. The first term is small because of bounds in Lemma A(iv) and (9). Because (cid:12)(cid:12) ξ Tt (cid:12)(cid:12) ≤ / √ T , P E (cid:12)(cid:12) ξ Tt (cid:12)(cid:12) p ≤ T − p/ . Therefore we have a pointwise bound. Unifor-mity in u, η can be established using monotonicity of I F θ ( ·| Ω t ) ( Y t , u ) and continuity of d (cid:16) G T,θ ( · | Ω t ) , F b θ T ( · | Ω t ) , u (cid:17) by employing bounds in Lemma A(iv) and (9).Finally, use that uniformly in u √ T X t (cid:16) d (cid:16) G T,θ ( · | Ω t ) , F b θ T ( · | Ω t ) , u (cid:17) − d ( G T,θ ( · | Ω t ) , F θ ( · | Ω t ) , u ) (cid:17) = √ T (cid:16)b θ T − θ (cid:17) T X t ∇ ( F θ ( · | Ω t ) , u ) + o p (1) . Proof of Theorem 1.
The joint weak convergence (6) follows from finite-dimensionalconvergence by CLT for MDS, while tightness was established in the proof of Lemma 4.39 roof of Theorem 2.
Note that S T = T X t =2 ξ Tt + 1( T − / { ( I θ ,T ( u ) − u ) I θ ,T − ( u ) + u ( I θ , ( u ) − u ) } , where ξ Tt := 1( T − / { ( I t,θ ( u ) − u ) I t − ,θ ( u ) + u ( I t,θ ( u ) − u ) } is a square integrable martingale difference by Lemma 1. The rest is similar to theproof of Theorem 1. To obtain S T ( u ) ⇒ S ∞ ( u ) under H , verify conditions N1-N3of Theorem A for ξ Tt as it is done in the proof of Lemma 3. The covariance function of S ∞ ( u ) is V ( u, v ) := ( u ∧ v ) ( u ∧ v ) − u v u v + ( u ∧ v ) plim T →∞ T T X t =2 γ t − ,θ ( u , v ) − plim T →∞ T T X t =2 γ t,θ ( u , v ) (cid:0) I t − ,θ ( u ∧ v ) − γ t − ,θ ( u , v ) (cid:1) + ( u ∧ v ) u v − u plim T →∞ T T X t =2 γ t,θ ( u , v ) I t − ,θ ( v )+ ( u ∧ v ) u v − v plim T →∞ T T X t =2 γ t,θ ( u , v ) I t − ,θ ( u ) . Under H T , apply the same weak convergence result under G T,t,θ ( · | Ω t ) with ζ Tt := ξ Tt − I t − ,θ ( u ) d ( G T,t,θ ( · | Ω t ) , F t,θ ( · | Ω t ) , u ) / √ T − u d ( G T,t,θ ( · | Ω t ) , F t,θ ( · | Ω t ) , u ) / √ T − , which is a square integrable martingale difference because of Lemma A(i) with G = G T,t,θ ( · | Ω t ) and F = F θ ( · | Ω t ). Then proceed as in the proof of Lemma 4.In order to establish (7), repeat the steps of the proof of Lemma 5 for e ζ Tt := ζ Tt − b ζ Tt ,where b ζ Tt is ζ Tt with F t, b θ T in place of F t,θ . Proof of Theorem 4.
Repeat the arguments of the proofs of Theorems 1 and 2 forsample generated by F θ T , defined in Assumption 6, to obtain conditional convergence.Then follow as in Andrews (1997) proof of Corollary 1.40 .5 Checking assumptions for the Poisson model Here we write Y t for Y ⋆t . For Poisson model Y t | Ω t ∼ Poisson( λ t ) the probabilitydistribution is Pr( Y t = k | Ω t ) = P λ t ( k ) = λ kt exp( − λ t ) k ! and the cumulative distributionfunction is F t,θ ( k | Ω t ) = k X j =0 Pr( Y t = j | Ω t ) = k X j =0 λ jt exp( − λ t ) j ! = Q ( k, λ t ) , where Q ( · , · ) is the regularized gamma function, and λ t = λ t ( β ) = exp( X ′ t β ), t = 1 , , . . . .If covariates X t are iid or stationary and ergodic, and Ω t omits lags of the dependentvariable Y t , then the LLN applies both under the null and local alternatives (like, e.g.,the local alternative considered in Eq. (2.12) in Cameron and Trivedi, 1990) to justifyAssumptions 2-6 and Assumption A, which involve functions of Ω t that are uniformlycontinuous in u . However, it can also be interesting to allow the intensity to depend onlags of the dependent variable. For simplicity we consider AR (1) dynamics. AR ( p ) canbe treated similarly but is more lengthy. The parameters enter through λ t = λ t ( θ ) = α + α λ t − + ρY t − , t = 1 , , . . . , and are gathered in θ = ( α , α , ρ ) ′ . We assume that α , α , ρ are positive, λ and Y are fixed and α + ρ <
1. Under these conditions, thereexist a unique stationary and ergodic solution to this model (Fokianos et al., 2009). Suchdata generating processes allow to use results on (generic, uniform) LLN, which facilitatethe checking of assumptions in the paper. Conditions for stationarity and ergodicityfor nonlinear λ t ( θ ) can be found in Neumann (2011) and are directly applicable to theanalysis under the null hypothesis. However, we are not aware of LLN results for thesemodels under local alternatives despite Fokianos and Neumann (2013, Proposition 2.3(ii))use related arguments.Let λ t, = λ t ( θ ) and the null hypothesis is Y t | Ω t ∼ Poisson( λ t, ) for some θ ∈ Θ.Then U t = Q ( Y t + 1 , λ t, ) and U − t = Q ( Y t , λ t, ), and the nonrandomized transform Y t t,θ ( u ) for u ∈ [0 ,
1] is I t,θ ( u ) = , u ≤ Q ( Y t , λ t, ); u − Q ( Y t , λ t, ) λ Y t t, exp( − λ t, ) Y t ! , Q ( Y t , λ t, ) ≤ u ≤ Q ( Y t + 1 , λ t, );1 , Q ( Y t + 1 , λ t, ) ≤ u, from where one obtains the empirical processes and the test statistics defined in Sec-tions 1-2.Now consider Assumption 1. For Poisson model γ t,θ ( u, v ) = ( Q ( k + 1 , λ t, ) − u ∨ v ) ( u ∧ v − Q ( k, λ t, )) λ kt, exp( − λ t, ) k ! { k ( u ) = k ( v ) } , where k = k ( u ) = min { y : Q ( y, λ t, ) ≥ u } . For the Poisson DGP described above, Y t isstationary and ergodic, γ ∞ ( u, v ) := E (cid:2) γ ,θ ( u, v ) (cid:3) satisfies Assumption 1. By the sameargument Assumptions 2, 3D, 4C, 5 are fulfilled.Assumption 3A and 3B are trivial. For Assumption 3C note that˙ F t,θ ( k | Ω t ) = k − X j =0 λ jt j ! − k X j =0 λ jt j ! ! exp( − λ t ) ˙ λ t = − λ kt k ! exp( − λ t ) ˙ λ t , where ˙ λ t = (cid:18) α ∂λ t − ∂α , λ t − + α ∂λ t − ∂α , Y t − + α ∂λ t − ∂ρ (cid:19) ′ . The last expression can be iterated from t − t = 1 and because α < Acknowledgements
We thank Juan Mora for useful comments. Support from the Ministerio Economia yCompetitividad (Spain), grants ECO2012-31748, ECO2014-57007p, MDM 2014-0431,Comunidad de Madrid, MadEco-CM (S2015/HUM-3444), and Fundaci´on Ram´on Arecesis gratefully acknowledged.
References [1] Andrews, D.W.K. (1997) A conditional Kolmogorov test,
Econometrica
65, 1097-1128.[2] Bai, J. (2003) Testing Parametric Conditional Distributions of Dynamic Models,
Review of Economics and Statistics
85, 531-549.[3] Basu, D. and R. de Jong (2007). Dynamic Multinomial Ordered Choice with anApplication to the Estimation of Monetary Policy Rules.
Studies in Nonlinear Dy-namics and Econometrics , 11, 1507-1507.[4] Burke, M. D., Csorgo M., Csorgo S. and P. Revesz (1978). Approximaiton of theempirical process whe parameters are estimated.
Annals of Probability , 7, 790-810.[5] Cameron A.C. and P.K. Trivedi (1990) Regression-based tests for overdispersion inthe Poisson model,
Journal of Econometrics
46, 347-364.[6] Corradi, V. and R. Swanson (2006) Bootstrap conditional distribution test in thepresence of dynamic misspecification,
Journal of Econometrics
Biometrics , 65, 1254-1261.[8] Davis R. A., W. T. M. Dunsmuir and S. B. Streett (2003) Observation-Driven Modelsfor Poisson Counts.
Biometrika
90, 777-790.[9] Delgado, M. and Escanciano, J. C. (2007) Nonparametric tests for conditional sym-metry in dynamic models.
Journal of Econometrics
Journal of Econometrics
Oxford Bulletin of Economics and Statistics , 64, 159-182.4312] Doukhan P., K. Fokianos and D. Tjostheim (2012) On weak dependence conditionsfor Poisson autoregressions,
Statistics and Probability Letters
82, 942-948.[13] Fokianos K. and M. Neumann (2013) A goodness-of-fit test for Poisson count pro-cesses,
Electronic Journal of Statistics
7, 793-819.[14] Fokianos K., A. Rahbek and D. Tjostheim (2009) Poisson Autoregression,
Journalof the American Statistical Association
Bernoulli,
20, 1344-1371.[16] Giacomini R., Politis D. and H. White (2013) A Warp-Speed Method for ConductingMonte Carlo Experiments Involving Bootstrap Estimators,
Econometric Theory
Modeling Ordered Choices. A Primer.
Cam-bridge University Press.[18] Hamilton, J. and O. Jord´a (2002). A model of the Federal Funds rate target.
TheJournal of Political Economy
Limit Theorems for Stochastic Processes.
Springer, 2d Ed. Berlin.[21] Jung, R.C., M. Kukuk and R. Liesenfeld (2006). Time series of count data: modeling,estimation and diagnostics.
Computational Statistics and Data Analysis , 51, 2350-2364.[22] Kauppi, H. and Saikkonen, P. (2008). Predicting U.S. recessions with dynamic binaryresponse models.
Review of Economics and Statistics
90, 777-791.[23] Kedem and Fokianos (2002).
Regression Models for Time Series Analysis.
Willey,New Jersey.[24] Kheifets, I. (2015). Specification Tests for Nonlinear Time Series Models.
Economet-rics Journal
18, 67-94.[25] Kheifets, I., and C. Velasco (2013). Model Adequacy Checks for Discrete ChoiceDynamic Models. In Recent Advances and Future Directions in Causality, Prediction,and Specification Analysis Essays in Honor of Halbert L. White Jr. Chen, Xiaohong;Swanson, Norman R. (Eds.), 363-382.[26] Lee, S. (2014). Goodness of fit test for discrete random variables,
ComputationalStatistics and Data Analysis
69, 92-100.4427] Machado, J. A. F., and Santos Silva, J. M. C. (2005). Quantiles for Counts,
Journalof the American Statistical Association
Journal of Econometrics
Bernoulli
17, 1268-1284.[30] Nishiyama Y. (2000). Weak convergence of some classes of martingales with jumps.
Annals of Probability
28, 685-712.[31] Rydberg and Shephard (2003). Dynamics of Trade-By-Trade Price Movements: De-composition and Models.
Journal of Financial Econometrics
1, 2-25.[32] Rosenblatt M. (1952). Remarks on a Multivariate Transformation.
Annals of Math-ematical Statistics
23, 470-72.[33] Startz R. (2008). Binomial Autoregressive Moving Average Models with an Appli-cation to U.S. Recessions.
Journal of Business and Economic Statistics
26, 1-8.[34] Stute W., Gonzalez Manteiga W., and M. Presedo Quindimil (2002). BootstrapBased Goodness-Of-Fit-Tests.
Metrika
40, 243-256.[35] Stute W. and Zhu, L.-X. (2002). Model checks for generalized linear models.
Scan-dinavian Journal of Statistics
29, 535-545.[36] van der Vaart, A. W. and Wellner, J. A. (1996).