Asymptotic Distribution of the Score Test for Detecting Marks in Hawkes Processes
Simon Clinet, William T.M. Dunsmuir, Gareth W. Peters, Kylie-Anne Richards
aa r X i v : . [ m a t h . S T ] A p r Asymptotic Distribution of the Score Test forDetecting Marks in Hawkes Processes
Simon Clinet ∗ , William T.M. Dunsmuir † ,Gareth W. Peters ‡ , Kylie-Anne Richards § May 1, 2019
Abstract
The asymptotic distribution of the score test of the null hypothesisthat marks do not impact the intensity of a Hawkes marked self-excitingpoint process is shown to be chi-squared. For local asymptotic power, thedistribution against local alternatives is also established as non-central chi-squared. These asymptotic results are derived using existing asymptoticresults for likelihood estimates of the unmarked Hawkes process modeltogether with mild additional conditions on the moments and ergodicityof the marks process and an additional uniform boundedness assumption,shown to be true for the exponential decay Hawkes process.
Keywords:
Marked Hawkes point process; Ergodicity; Quasi likelihood; Scoretest; Inferential statistics; Local power. ∗ Faculty of Economics, Keio University. 2-15-45 Mita, Minato-ku, Tokyo, 108-8345, Japan.Phone: +81-3-5427-1506. E-mail: [email protected], website: http://user.keio.ac.jp/˜clinet/ † Corresponding Author, School of Mathematics and Statistics, University of NewSouth Wales, Sydney, NSW 2052, Australia. E-mail: [email protected], website:https://research.unsw.edu.au/people/professor-william-t-m-dunsmuir. ‡ § Introduction
Since their introduction over fifty years ago Hawkes self exciting process mod-els (Hawkes, 1971) have been used to model point processes in many fields ofapplication including seismology (Ogata, 1988), sociology (Crane and Sornette,2008), modelling neuronal systems and increasingly in recent years for mod-elling high frequency financial trading (for a general review, see Bacry et al.(2015); Hawkes (2018)). Extensions of the Hawkes process where parametersare time-varying and replicate the non-stationarity of intraday financial datahave also been considered in Chen and Hall (2013); Clinet and Potiron (2018)for example. The theoretical properties of such models are quite well advancedas is estimation methodology and its associated statistical theory. Increasingly marked
Hawkes processes, in which marks attached to past event times influencefuture intensities, are being considered for a range of applications. For example,Richards et al. (2018) consider the use of marked Hawkes processes for mod-elling millisecond recordings of activity in the limit order book for a range ofassets traded on international futures markets. In these applications there arenumerous potential marks that are recorded at each event and a method is re-quired to efficiently screen out those that are not influential on future eventarrival intensities before the joint models for the event times and associatedmarks are estimated.Assessment of influential marks could be done by simultaneously estimatingthe parameters of the marked Hawkes process and then assessing them for sta-tistical significance. Even if there is a single scalar valued mark included in themodel estimation using, say, maximum likelihood methods, there are computa-tional challenges. When numerous marks are jointly included in the model thesechallenges are substantial. The use of likelihood for the marked Hawkes processmodel, if feasible, would allow use of standard inferential techniques such as theWald or likelihood ratio test for testing the significance of marks impact. How-ever, the relevant statistical inference methods and theory for marked Hawkesprocesses are not well developed at this time.An alternative approach for assessing the impact of marks on intensity is thescore test as proposed in Richards et al. (2018) where details of computationalimplementation, simulation and application to limit order book event series ispresented. As is well known, this test, also known as the Lagrange Multipliertest (Breusch and Pagan, 1980), is computed using the score of the likelihoodevaluated under the null hypothesis, which, in this application, is that marksdo not impact the intensity, so that the event times are that of an unmarkedHawkes process. The score statistic can then be constructed easily based ona single fitted intensity for an unmarked process. Because of this, the scoretest leads to substantial computational advantages particularly when relevantand significant marks need to be selected from a possibly large catalogue beforerequiring the effort of jointly fitting the marked process model. Apart fromthe obvious computational advantage afforded by using a single Hawkes modelfit, the score test large sample distribution theory can be derived using existingasymptotic theory for unmarked
Hawkes processes as is explained below. In this2aper the large sample distribution of the score test under the null hypothesisthat the mark or marks under test do not boost the intensity of events is shownto be the standard chi-squared distribution with appropriate degrees of freedom.We also show that the power against local alternatives is non-central chi-squared.In the literature on the theory and inferential methods for marked Hawkesprocesses focus has been on the situation where marks are unpredictable (in asense to be detailed below), such as when they are independent and identicallydistributed. Our extensive experience with applications in high frequency finan-cial data suggests that influential marks display serial dependence in additionto cross dependence between the marks. Accordingly, we derive the asymp-totic distribution of the score test for serially dependent multivariate valuedmarks. This requires us to show that stable marked Hawkes process exist whenthe marks are stationary serially dependent, something that is not currentlyavailable in the literature.From now on we consider a univariate Hawkes self exciting marked pointprocess (SEPP) N g ∈ N × X , observed over the interval t ∈ [0 , T ] and whichtakes the value 0 at t = 0. There are N T events observed in the interval[0 , T ] at times 0 < t < t < . . . < t N T ≤ T and a vector of d marks X i ∈ X ⊂ R d is associated with the i th event. The observed points of thisprocess are { ( t i , x i ) , i = 1 , . . . , N T } . In Richards et al. (2018) relevant marksconstitute a vector of correlated marks which are also serially dependent. In or-der to accommodate such examples we explain how to define a marked Hawkesprocess with serially dependent marks, give conditions for stationarity of thepoint process, and, define the relevant quasi-likelihood.Following Liniger (2009) and Embrechts et al. (2011), with modifications tonotation as used in Clinet and Yoshida (2017), let the marked Hawkes SEPPhave intensity process given by λ g ( t ; θ, φ, ψ ) = η + ϑ Z [0 ,t ) × X w ( t − s ; α ) g ( x ; φ, ψ ) N g ( ds × d x ) (1)where w is a non-negative decay function satisfying Z ∞ w ( s ; α ) ds = 1 , Z ∞ sw ( s ; α ) ds < ∞ . (2)The immigration rate is η , the branching coefficient ϑ and the parameter α , notnecessarily scalar, specifies the decay function w . The marks X have density f ( x ; φ ) (w.r.t. Lebesgue measure) and impact the intensity through the scalarvalued boost function g ( X ; φ, ψ ), where g ( · ; φ, ψ ) : R d → R + and ψ is a vec-tor parameter of length r specifying the way in which marks enter the boostfunction. In addition, g depends on the parameters of the marks density, φ ,because the normalization E φ [ g ( X ; φ, ψ )] = 1 is required to obtain a stationarysolution to (1) as in Embrechts et al. (2011). In Section 2 we show, for the casewhere the marks are i.i.d as considered in Embrechts et al. (2011), that underthis normalization a stationary solution exists and, for serially dependent marksunder a stronger condition on the conditional expectation of g ( x ; φ, ψ ) given thepast, this is also true. 3enceforth, let θ = ( η, ϑ, α ) ∈ Θ, φ ∈ Φ and ψ ∈ Ψ for some parameterspaces Θ, Φ and Ψ. Let ν = ( θ, φ, ψ ) ∈ Θ × Φ × Ψ be the collection of allparameters for the marked process with intensity function (1). We denote thetrue value of the parameters as ( θ ∗ , φ ∗ , ψ ∗ ). When the null hypothesis holds,the true parameter vector is denoted ν ∗ = ( θ ∗ , φ ∗ ,
0) and it is assumed that g ( x ; φ, ≡ R K , K >
1. The parameter spaceΦ for the marks density will typically be the natural space of parameters for thespecified density and the boost parameter space Ψ is chosen as appropriate forthe form of g .Quite general normalized boost functions, g ( X ; φ, ψ ) can be constructed bystarting with a function h ( X , ψ ) and defining the boost function g ( X ; φ, ψ ) = h ( X ; ψ ) E φ [ h ( X ; ψ )] . (3)It is no loss of generality to require h ( X ; 0) ≡
1. Many examples of boost func-tions including the polynomial, exponential and power functions forms for h , aspresented in Liniger (2009), as well as additive and multiplicative combinationsof individual elements of X used in Richards et al. (2018) can be formulated inthis way. The null hypothesis being assessed with the score test is H : ψ = 0,which is equivalent to g ( x ; φ,
0) = 1 so that marks do not boost intensity.Under H the observed event times are those of an unmarked Hawkes SEPP, N with intensity denoted by λ ( t ; θ ) = η + ϑ Z [0 ,t ) w ( t − s ; α ) N ( ds ) . (4)We assume that the initial value of the intensity is λ ( t ) = C for some spec-ified value of C ; for example C = E [ λ ( t )] = η/ (1 − ϑ ), the theoretical longrun average for a stationary Hawkes process (Laub et al., 2015). Note alsothat this intensity function is not defined using events prior to the observa-tion period, that is for t <
0, because in practice (1) is used for computationof the likelihood. The intensity process defined in Embrechts et al. (2011) isthe stationary version with infinite, but unobserved, event history included.In Br´emaud and Massouli´e (1996) the authors show that a suitable probabilityspace exists on which a stationary version on R , N ∞ , can be defined and towhich the non-stationary version in (1) converges. We assume that N ∞ is er-godic and this is proven in Clinet and Yoshida (2017) for the exponential decaycase. Ogata (1978) considers both the stationary version of the intensity processand the non-stationary version as in (4) along with the associated likelihoods.The remainder is organized as follows. Section 2 proves (see Proposition1), via a thinning construction, that a stationary marked Hawkes process canbe constructed when the marks are observed from a continuous time stationaryprocess and gives several examples of processes for which the conditions are met.4ection 3 extends the definition of the joint likelihood of event times and marksbeyond the i.i.d. case currently available in the literature to marks which areserially dependent. Section 4 defines the score test in detail. Section 5 states themain result (see Theorem 1) that the score statistic is asymptotically chi-squareddistributed under the null hypothesis that marks do not impact the intensityfunction. For this result, in addition to the conditions of Clinet and Yoshida(2017) for the consistency and asymptotic normality of the unmarked process,conditions are required on the existence of moments and ergodicity of the markprocess itself together with an additional condition (Condition 3) which links themarks and the unmarked intensity process. Lemma 1 shows that the Condition3 is satisfied for the case of exponential decay function w . Section 6 proves thatthe score statistic is asymptotically non-central chi-squared distributed underlocal alternatives of the form ψ ∗ T = γ ∗ / √ T where T → ∞ and T is the lengthof the interval over which the point process is observed. Section 7 discussespossible extensions to the main results. The Appendices contain proofs. Assume the existence of a probability space (Ω , F , P ) bearing a continuous timeprocess ( y t ) t ∈ R taking values in the Borel space ( X , X ) (meaning that there ex-ists a bijection h between X and [0 ,
1] such that h and h − are measurable, see(Kallenberg, 2006, p.7)). We define ( F y t ) t ∈ R the canonical filtration of y . Wenow give a proof of the existence of the marked Hawkes process using a thin-ning method similar to (Liniger, 2009, Chapter 6) and Br´emaud and Massouli´e(1996). To that end, we assume the existence on (Ω , F , P ) of a Poisson process N with intensity 1 on R , with points denoted ( t i , u i ) i ∈ Z and independent of y .Then we consider the canonical process ( t i , u i , y t i ) i ∈ Z and see this process as arandom measure N g on R × X . We associate to N the filtration generated bythe σ -algebras F Nt = σ { N (( −∞ , s ] × A ) , s ∈ ( −∞ , t ] , A ∈ B ( R ) } , where B ( R ) isthe Borel σ -field of R . Similarly, we associate to N g the filtration generated bythe σ -algebras F N g t = σ { N g (( −∞ , s ] × A × B ) , s ∈ ( −∞ , t ] , A ∈ B ( R ) , B ∈ X } .Note that N ( dt × du ) = N g ( dt × du × X ). Consider the filtration generatedby F t = F y t ∨ F Nt . In Proposition 1 below, we show the existence of a markedpoint process N g adapted to F t , satisfying (1), and which is constructed as anintegral over the canonical measure N g .Before we state our result, we recall that a marked point process of the form( τ i , y τ i ) i ∈ Z is stationary if for any t ∈ R , it has the same distribution (seen as arandom measure) as ( τ i + t, y τ i + t ) i ∈ Z . Proposition 1.
Assume that for any t ∈ R , E [ g ( y t ; φ, ψ ) |F y t − ] ≤ C where C < /ϑ , along with R + ∞ w ( s ; α ) ds = 1 . Then, there exists a marked pointprocess of the form ( τ i , x i ) i ∈ Z := ( τ i , y τ i ) i ∈ Z , also represented by the randommeasure N g on R × X , such that:1. The counting process associated to ( τ i ) i ∈ Z is adapted to ( F t ) t ∈ R , and ad- its the stochastic intensity (with respect to F t ) λ g ( t ) = η + ϑ Z ( −∞ ,t ) × X w ( t − s ; α ) g ( x ; φ, ψ ) N g ( ds × d x ) .
2. The random measure N g admits π ( ds × d x ) = λ g ( s ) ds × F s ( d x ) as pre-dictable compensator, where F s ( d x ) is the conditional distribution of y s given F y s − . We recall that by definition, for any non-negative measurablepredictable process W (see (Jacod and Shiryaev, 2013, Theorem II.1.8)),the process R ( −∞ ,t ] × X W ( s, x ) π ( ds × d x ) is predictable, and moreover for any u ≤ t E "Z ( u,t ] × X W ( s, x ) N g ( ds × d x ) |F u = E "Z ( u,t ] × X W ( s, x ) π ( ds × d x ) |F u .
3. If the process ( y t ) t ∈ R is stationary, then so is N g .Proof of Proposition 1. We construct by thinning and a fixed point argumentthe marked point process N g . Define λ g, ( t ) = η . By induction, we then definethe sequences of processes ( N g,n ) n ∈ N and ( λ g,n ) n ∈ N as follows. For any t ∈ R and A ∈ X N g,n (( −∞ , t ] × A ) = Z ( ∞ ,t ] × R × A { ≤ u ≤ λ g,n ( s ) } N g ( ds × du × d x ) , (5a) λ g,n +1 ( t ) = η + ϑ Z ( −∞ ,t ) × X w ( t − s ; α ) g ( x ; φ, ψ ) N g,n ( ds × d x ) . (5b)By taking conditional expectations throughout (5a) it is immediate to see that t → λ g,n ( t ) is the stochastic intensity of t → N g,n (( −∞ , t ] × X ). Moreover, bypositivity of w ( t − s ; α ) g ( x ; φ, ψ ), we immediately deduce that the point process N g,n (( −∞ , t ] × A ) and the stochastic intensity λ g,n ( t ) are point wise increasingwith n , so that we may define N g and λ g their limit processes. Moreover,by the monotone convergence theorem, taking the limit n → + ∞ in the aboveequations yields that t → λ g ( t ) is the stochastic intensity of t → N g (( −∞ , t ] × X )and has the desired shape. All we have to check to get the first claim of theproposition is the finiteness of the two limit processes. Let ρ n ( t ) = E [ λ g,n ( t ) − λ g,n − ( t )]. We have ρ n ( t ) = ϑ E "Z ( −∞ ,t ) × X w ( t − s ; α ) g ( x ; φ, ψ ) { N g,n − − N g,n − } ( ds × d x ) = ϑ E "Z ( −∞ ,t ) × X w ( t − s ; α ) g ( x ; φ, ψ ) { λ g,n − ( s ) − λ g,n − ( s ) } dsF s ( d x ) = ϑ E "Z ( −∞ ,t ) w ( t − s ; α ) E [ g ( y s ; φ, ψ ) |F y s − ] { λ g,n − ( s ) − λ g,n − ( s ) } ds ≤ Cϑ Z ( −∞ ,t ) w ( t − s ; α ) ρ n − ( s ) ds, F s − measurability of the stochastic intensities andthat E [ g ( y s ; φ, ψ ) |F s − ] = E [ g ( y s ; φ, ψ ) |F y s − ] ≤ C (by independence of y and N ). From here, we deduce that sup s ∈ ( −∞ ,t ) ρ n ( s ) ≤ Cϑ sup s ∈ ( −∞ ,t ) ρ n − ( s )since R + ∞ w ( s ; α ) ds <
1. By a similar calculation, we also have ρ ( t ) ≤ Cϑη ,so that by an immediate induction sup s ∈ ( −∞ ,t ) ρ n ( s ) ≤ ( Cϑ ) n η . Therefore, E λ g ( t ) = η + P + ∞ k =1 ρ k ( t ) ≤ η/ (1 − Cϑ ) < + ∞ , which implies the almost surefiniteness of both N g (on any set of the form [ t , t ] × X , −∞ < t ≤ t < + ∞ )and λ g ( t ). This proves the first claim. Now we prove the second point. By amonotone class argument, it is sufficient to take W ( s, x ) = A ( x ) and prove themartingale property for any A ∈ X . We show that for any n ∈ N , π n ( ds × d x ) = λ g,n ( s ) ds × F s ( d x ) is the compensator of N g,n . For n = 0, we have E "Z ( u,t ] × X W ( s, x ) N g, ( ds × d x ) |F u = E X u The above construction also works for a marked point process start-ing from instead of −∞ (just replace −∞ by in all the integrals). In thatcase, the resulting process N g is obviously not stationary, but one can prove that N g converges to the stationary version starting from −∞ by a straightforwardadaptation of the proof of Theorem 1 in Br´emaud and Massouli´e (1996). Remark 2. If the marks y s are i.i.d, the conditional expectation E [ g ( y s ) |F y s − ] reduces to the usual expectation E [ g ( y s )] which is equal to unity due to normal- zation of the boost function g . Hence the condition that E [ g ( y s ) |F y s − ] ≤ C < /ϑ is obviously satisfied. Remark 3. When y is a left-continuous process, or more generally a predictableprocess, then E [ g ( y s ) |F y s − ] = g ( y s ) which leads to the condition g ( y s ) ≤ C < /ϑ . This may be very restrictive in practice. For example, for a parametriclinear boost in a single mark this would require the mark process to be boundedabove by a constant which depends on φ , ψ and ϑ . It is also interesting to notethat for mark processes with continuous sample paths, mixing conditions (whichspecify the rate at which dependence fades away with increasing time separation)will not lead to a weakening of the aforementioned stringent condition. Thedifficulty stems from the dependence of g ( y t ) and g ( y s ) when t and s are closetogether. If at some time t , g ( y t ) > /ϑ , by ‘continuity’, it will stay above thatlevel for some time [ t , t + ǫ ] . On this interval the process becomes explosive,and regardless of the number of jumps, all the marks are highly correlated since g ( y s ) ≈ g ( y t ) > /ϑ for s ∈ [ t , t + ǫ ] . Remark 4. In view of the last remark, marks which arise from a stochasticprocess with continuous sample paths are probably not practical for use in markedHawkes self exciting processes. On the other hand, marks based on a stochasticprocess which contains some degree of independence could more easily satisfythe condition of Proposition 1. For example, let y t = U t + V t where U t hascontinuous sample paths and V t be a pure noise process independent of U t . Moregenerally a conditionally independent specification in which U t is as before and y t |{ U t } i.i.d ∼ f ( ·| U t ) would also more easily satisfy the condition. For instance U t may specify some of the parameters needed for the density f . Remark 5. The condition E [ g ( y t ) |F y t − ] ≤ C < /ϑ can be slightly relaxed as wenext explain. Let ρ n ( t ) = E [ λ g,n ( t ) − λ g,n − ( t )] . Following the proof of Proposi-tion 1 we know that a sufficient condition for non-explosion is P + ∞ n =1 ρ n ( t ) < ∞ .A straightforward induction shows that ρ n ( t ) = ϑ n η E Z t −∞ · · · Z t n − −∞ w ( t − t ) · · · w ( t n − − t N T ) ψ t · · · ψ t NT dt · · · dt N T , where ψ t = E [ g ( y t ) |F y t − ] . Therefore, if we replace the above condition by R t −∞ w ( t − s ) E [ g ( y s ) |F y s − ] ds ≤ C < /ϑ for any t ∈ R , then P + ∞ n =1 ρ n ( t ) ≤ η − Cϑ < + ∞ , and the process is stable. Remark 6. If the marks are considered to be observations on a discrete timeprocess then it is not obvious that the thinning method used above can be used toconstruct a marked Hawkes process. Moreover, while it is possible to constructa non-stationary marked Hawkes process with discrete marks by iterating theintensity function from some initial time t , it is not clear whether there existsa stationary version of this process on R . Quasi-likelihood for marked Hawkes processes The log-likelihood and associated statistical properties for the unmarked HawkesSEPP has a long history – see Ozaki (1979), Ogata (1978) or Andersen et al.(1996) for example. In deriving the likelihood (Embrechts et al., 2011, Defini-tion 3) assume that the marks are unpredictable as defined in (Daley and Vere-Jones,2002, Definition 6.4.III(b)) so that the distribution of X i , the mark at time t i ,is independent of previous event times and marks, i.e. of { ( t j , X j ) } for t j < t i .An example of unpredictable marks is where the marks are conditionally i.i.d.given the past of the process but the marks may impact on the future of theintensity λ g as in (1). The simplest example of this is where the marks areactually i.i.d. unconditionally as considered in Embrechts et al. (2011). In ourempirical analysis we have frequently observed that { X i } is a time series of seri-ally dependent marks. In this case the unpredictability property does not hold.As far as we can determine, in the literature on likelihood inference for markedHawkes processes there is no existing treatment of the serially dependent markscase.In general, it is possible to represent the log-likelihood ¯ l g when the marks arenot i.i.d as follows: recall that the integer-valued measure N g ( dt × d x ) admitsa predictable compensator π ( dt × d x ) by Proposition 1 (ii) of the form π ( ds × d x ) = λ g ( s ; ν ) ds × F s ( d x , φ ). Assuming that for any s ∈ R + , the conditionaldistributions F s ( d x , φ ) are dominated by some measure c ( d x ) ( F s ( d x , φ ) = f s ( x ; φ ) c ( d x )), using (Jacod and Shiryaev, 2013, Theorem III.5.19)) we can gen-eralize the log-likelihood (11) for the pure point process with¯ l g ( ν ) = Z [0 ,T ] × X ln[ λ g ( t ; ν ) f t ( x ; φ )] N g ( dt × d x ) − Z [0 ,T ] Z X f t ( x ; φ ) c ( d x ) | {z } =1 λ g ( t ; ν ) dt. Expanding the logarithm, we get¯ l g ( ν ) = Z [0 ,T ] × X ln λ g ( t ; ν ) N g ( dt × d x ) − Λ g ( T ; ν )+ Z [0 ,T ] × X ln f t ( x ; φ ) N g ( dt × d x ) , (6)where the compensator at T isΛ g ( T ; ν ) = Z [0 ,T ] λ g ( t ; ν ) dt. However, because of the third term, computing (6) requires that one observesthe whole trajectory of the joint process ( N t , y t ) t ∈ [0 ,T ] . When assuming thatwe only have discrete observations of the form ( t i , x i ) ≤ i ≤ N T = ( t i , y t i ) ≤ i ≤ N T ,the last term in (6) should be changed to P N T i =1 ln f ( y t i ; φ | ( t j , y t j ) ≤ j
The marks are i.i.d with density f w.r.t some measure c as inEmbrechts et al. (2011) and the last term in (7) becomes R [0 ,T ] × X ln f ( x ; φ ) N g ( dt × d x ) which evaluates to P N T i =1 ln f ( x i ; φ ) and the log-likelihood is l g ( ν ) = Z [0 ,T ] × X ln λ g ( t ; ν ) N g ( dt × d x ) − Λ g ( T ; ν ) + N T X i =1 ln f ( x i ; φ ) . (8) Example 2. More generally, the marks are observations x i = y t i on a sta-tionary process ( y t ) t ∈ R + in continuous time. Then the last term in (6) is thesum of the log conditional densities of y t i | y t i − , . . . , y t evaluated at the x i . Wecan write this as ln f ( x , . . . , x N T | t , . . . , t N T ; φ ) giving the quasi-likelihood inthe form l g ( ν ) = Z [0 ,T ] × X ln λ g ( t ; ν ) N g ( dt × d x ) − Λ g ( T ; ν )+ ln f ( x , . . . , x N T | t , . . . , t N T ; φ ) (9) where φ represents all the parameters of the joint conditional distribution includ-ing any parameters needed to model serial dependence. Note that this densitydepends on the event times since the specification of joint distributions for thecontinuous time process requires these. A simple example is when ( y t ) t ∈ R + isa stationary Gaussian process with covariance between y t and y s given by afunction Γ( s − t ; φ ) depending on parameters φ . Example 3. In applications to the limit order book, Richards et al. (2018)modelled the marks x i at event time t i as observations on a discrete time sta-tionary process indexed by event index i . The third term in (9) is replaced by ln f ( x , . . . , x N T ; φ ) where f now denoted the joint density for the discrete timestationary time series { x i } in which actual event times are ignored and only theindices, i , of event times are needed to model serial dependence structure. Itis not clear that this can be written as an integral with respect to N g ( dt × d x ) corresponding to the third term in (6) . Note that this leads to the objectivefunction l g ( ν ) = Z [0 ,T ] × X ln λ g ( t ; ν ) N g ( dt × d x ) − Λ g ( T ; ν ) + ln f ( x , . . . , x N T ; φ ) (10) to be maximised over the parameters. However this is not a formal likelihood,nor does it seem possible to define a stationary Hawkes process, as we did in ection 2 for the case where marks are drawn from a stationary discrete timeprocess. Of course in the absence of serial dependence both (9) and (10) lead tothe i.i.d. version (8) considered in the literature to date. When ψ = 0 the boost is the identity so that marks do not impact theintensity. But the marks process and the event process may not be independentbecause the conditional distribution of marks is not free of the event times.None-the-less, the log-likelihood in (7) becomes a sum of two terms l ( θ, φ ) = l ( θ ) + N T X i =1 ln f ( y t i ; φ | ( t j , y t j ) ≤ j
0) denote the true value of the combined parameters under H .Let ˆ ν T = (ˆ θ T , ˆ φ T , 0) where ˆ θ T is the quasi asymptotic maximum likelihood esti-mate, as in (Clinet and Yoshida, 2017, page 1804), based on the likelihood (11)under H of the intensity process parameters and ˆ φ T be the MLE for the pa-rameters of the marks density. Denote the derivatives of the log-likelihood withrespect to ν as ∂ ν l g ( ν ) at the parameter value ν so that ∂ θ l g ( ν ∗ ) and ∂ ν l g (ˆ ν T )11re evaluated at ν ∗ and ˆ ν T respectively. The score (or Lagrange multiplier) teststatistic (Breusch and Pagan, 1980) is defined asˆ Q T = ∂ ν l g (ˆ ν T ) T I (ˆ ν T ) − ∂ ν l g (ˆ ν T ) (12)where I ( ν ∗ ) = E ν ∗ [ ∂ θ l g ( ν ∗ ) ∂ θ l g ( ν ∗ ) T ] and I (ˆ ν T ) evaluates this at the parame-ters, ˆ ν T , estimated under H . Also (Breusch and Pagan, 1980) the informationmatrix can be replaced by any matrix with the same limit in probability, forexample the negative of the matrix of second derivatives of the log-likelihood,and the large sample properties of the score statistic will be the same. UnderCondition 1 stated below on the functions h defining the boost functions g via(3) the information matrix is shown in Richards et al. (2018) to be block diag-onal which, together with ∂ ν l g (ˆ ν T ) = (0 , , ∂ ψ l g (ˆ ν T )) T , allows simplification of(12) to ˆ Q T = ∂ ψ l g (ˆ ν T ) T I ψ (ˆ ν T ) − ∂ ψ l g (ˆ ν T ) (13)where I ψ (ˆ ν ) is the r × r diagonal block of I (ˆ ν T ) corresponding to ψ .Because of the third term in the log-likelihood (6) (and all variants (7), (8),(9), (10)) do not depend on ψ it follows that ∂ ψ l g ( ν ) = Z [0 ,T ] × X λ g ( t ; ν ) − ∂ ψ λ g ( t ; ν ) N g ( dt × d x ) − Z [0 ,T ] ∂ ψ λ g ( t ; ν ) dt (14)where ∂ ψ λ g ( t ; ν ) = ϑ Z [0 ,t ) × X w ( t − s ; α ) ∂ ψ g ( x ; φ, ψ ) N g ( ds × d x )and the vector of derivative of g with respect to ψ is ∂ ψ g ( X ; φ, ψ ) = 1 E φ [ h ( X ; ψ )] [ ∂ ψ h ( X ; ψ ) − g ( X ; φ, ψ ) E φ [ ∂ ψ h ( X , ψ )]] . Condition 1. Conditions on boost function specification: Throughoutwe assume h , used to define the boost function g in (3) , and its first and sec-ond derivatives with respect to ψ , denoted ∂ ψ h and ∂ ψψ h , satisfy the followingproperties:(i) h ( X ; 0) ≡ ;(ii) E φ [ h ( X ; ψ )] and E φ [ ∂ ψ h ( X , ψ )] exist for all ψ ∈ Ψ , φ ∈ Φ ;(iii) ∂ φ E φ ( h ( X ; ψ )) | ν ∗ = 0 ;(iv) ∂ φ E φ [ ∂ ψ h ( X , ψ )] | ν ∗ exists for all φ ∈ Φ ;(v) cov φ ( H ( X )) = Ω G ( φ ) where Ω G ( φ ) is a finite positive definite matrix forany φ ∈ Φ where H ( X ) := ∂ ψ h ( X ; 0) . h , E φ [ g ( X ; φ, ψ )] = 1 for all ψ ∈ Ψ, g ( X ; φ, ≡ 1, and, letting g ( x ; φ ) = ∂ ψ g ( X ; φ, E φ [ G ( X ; φ )] = 0. Withthe above specification, the null hypothesis of marks not impacting intensity isachieved by setting ψ = 0. Note that G ( X ; φ ) = H ( X ) − E φ [ H ( X )] is a vector ofdimension r comprised of functions of the components of the vector mark cen-tered at their expectations. The requirements that E φ [ h ( X ; ψ )], E φ [ ∂ ψ h ( X ; ψ )]and ∂ φ E φ [ ∂ ψ h ( X , ψ )] | ν ∗ exist impose obvious conditions on the marginal distri-bution of X m . For example, if h ( X ; ψ ) is a polynomial of degree p in X then E φ [ X p ] needs to exist. Condition 1 parts (iii) and (iv) are required in order thatthe information matrix for all parameters in the full model likelihood is blockdiagonal allowing simplification of the score statistic defined below. For the def-inition of the score statistic we require the existence and positive definiteness ofthe covariance matrix of g ( x ; φ ), Ω G ( φ ) = cov φ ( H ( X )), as stated in (vi). Under H , the derivative of (8) with respect to ψ at any values of θ , φ is ∂ ψ l g ( θ, φ, 0) = Z [0 ,T ] λ ( t ; θ ) − ∂ ψ λ g ( t ; θ, φ, N ( dt ) − Z [0 ,T ] ∂ ψ λ g ( t ; θ, φ, dt (15)with ∂ ψ λ g ( t ; θ, φ, 0) = ϑ Z [0 ,t ) × X w ( t − s ; α ) G ( x ; φ ) N g ( ds × d x ) . When evaluated at the estimates under the null hypothesis ∂ ψ l g (ˆ ν T ) = Z [0 ,T ] λ ( t ; ˆ θ T ) − ∂ ψ λ g ( t ; ˆ θ T , ˆ φ T , N ( dt ) − Z [0 ,T ] ∂ ψ λ g ( t ; ˆ θ T , ˆ φ T , dt and ∂ ψ λ g ( t ; ˆ θ T , ˆ φ T , ϑ T Z [0 ,t ) × X w ( t − s ; ˆ α T ) g ( x ; ˆ φ T ) N g ( ds × d x ) . When evaluated at the true parameter vector, ν ∗ = ( θ ∗ , φ ∗ , 0) under H , thescore (15) can be written as ∂ ψ l g ( ν ∗ ) = Z [0 ,T ] λ ( t ; θ ∗ ) − ∂ ψ λ g ( t ; ν ∗ ) ˜ N ( dt ) (16)where ˜ N ( dt ) = N ( dt ) − λ ( t ; θ ∗ ) dt and I ψ ( ν ∗ ) = E [ Z [0 ,T ] λ ( t ; θ ∗ ) − ( ∂ ψ λ g ( t ; ν ∗ )) ⊗ N ( dt )] , (17)where x ⊗ = x.x T . Noting that the expectation required to evaluate (17) isnot computable in closed form, we suggest empirical evaluation replacing theexpectation by the time average over events and using the estimate ˆ ν T to getˆ I ψ = Z [0 ,T ] λ ( t ; ˆ θ T ) − ( ∂ ψ λ g ( t ; ˆ ν T )) ⊗ N ( dt ) . (18)13e show that this empirical estimate has the same asymptotic limit as (17) whenscaled by T . Using these estimates in the definition (13), the score statistic canbe implemented in practice asˆ Q T = ∂ ψ l g (ˆ ν T ) T ˆ I − ψ ∂ ψ l g (ˆ ν T ) (19)where ∂ ψ l g (ˆ ν T ) is defined above and ˆ I ψ is given by (18). To prove that the score statistic ˆ Q T has a large sample chi-squared distributionunder the null hypothesis conditions are required on the intensity process for theunboosted process. The extra conditions are those required for convergence ofthe quasi MLE for Hawkes processes under H , for which the intensity does notdepend on marks. Because we adapt the proofs of (Clinet and Yoshida, 2017,Theorems 3.9 and 3.11) to the score statistic we re-state their conditions [A1],[A2], [A3] and [A4] here. These generalize Conditions A, B and C of Ogata(1978) applied to the intensity process defined in (4) for the unmarked process.Ogata (1978) provided the first consistency and asymptotic normality resultsfor the unmarked Hawkes process and verified that his conditions apply to theexponential decay function w ( t ; α ). Clinet and Yoshida (2017) give conditionsfor the convergence of moments of the quasi MLE and verify them for theexponential decay function case. As far as we are aware there has been nopublished verification of the conditions of Ogata (1978) or Clinet and Yoshida(2017) for the power law decay function. Condition 2. Conditions on the intensity process under H : ψ = 0 . For clarity, these are restated from Clinet and Yoshida (2017) using notationof this paper and as relevant to the Hawkes process. These conditions refer tothe intensity process defined in (4) . Recall that θ ∗ refers to the true parameterdefining the intensity process under H . A1 The mapping λ : Ω × R + × Θ → R + is F ⊗ B ( R + ) ⊗ B (Θ) -measurable.Moreover, almost surely:(i) for any θ ∈ Θ , s → λ ( s, θ ) is left continuous;(ii) for any s ∈ R + , θ → λ ( s, θ ) is in C (Θ) and admits a continuousextension to ¯Θ . A2 The intensity process λ and its derivatives satisfy, for any p > , sup t ∈ R + X i =0 k sup θ ∈ Θ | ∂ iθ λ ( t, θ ) |k p < ∞ . A3 For a Borel space ( E, B ( E )) let C b ( E, R ) be the set of continuous, boundedfunctions from E to R . For any θ ∈ Θ the triplet ( λ ( · , θ ∗ ) , λ ( · , θ ) , ∂ θ λ ( · , θ ))14 s ergodic in the sense that there exists a mapping π : C b ( E, R ) × Θ → R such that for any ( ξ, θ ) ∈ C b ( E, R ) × Θ , T Z T ξ ( λ ( s, θ ∗ ) , λ ( s, θ ) , ∂ θ λ ( s, θ )) ds → P π ( ξ, θ ) . A4 Define Y T ( θ ) = 1 T ( l T ( θ ) − l T ( θ ∗ ) , which is shown in (Clinet and Yoshida, 2017, Lemma 3.10) to satisfy sup θ ∈ Θ | Y T ( θ ) − Y ( θ ) | → P and Y ( θ ) is the ergodic limit of Y T ( θ ) as defined in (Clinet and Yoshida,2017, p. 1807). Assume, for asymptotic identifiability, that for any θ ∈ ¯Θ − { θ ∗ } , Y ( θ ) = 0 . Under Condition 2: [A1] to [A4], Clinet and Yoshida (2017) show (Theorem3.9) that any asymptotic QMLE ˆ θ T is consistent, ˆ θ T → P θ ∗ , and (Theorem3.11) asymptotically normal √ T (ˆ θ T − θ ∗ ) → d Γ − ζ where ζ has a standardmultivariate normal distribution and Γ is the asymptotic information matrix,assumed to be positive definite. Additionally they prove that Γ satisfiessup θ ∈ V T | T − ∂ θ l T ( θ ) + Γ | → P , where V T is a ball shrinking to θ ∗ .As noted above these conditions are met for the (multivariate) exponentialdecay Hawkes process without marks as shown in (Clinet and Yoshida, 2017,Section 4) assuming each element of θ = ( η, ϑ, α ) belongs to finite closedintervals of R . For example, for the exponential decay function w ( s ; α ) = α exp( − αs ), K = 3 and we assume that 0 < η ≤ η ≤ ¯ η < ∞ , 0 < ϑ ≤ ϑ ≤ ¯ ϑ < ∞ , 0 < α ≤ α ≤ ¯ α < ∞ so that Θ is a finite dimensional relativelycompact open subset of R .In order to establish the asymptotic distribution of the score vector withrespect to ψ , Condition 2 A.2 needs to be extended to accommodate the con-tribution to the score vector from the marks as follows. Condition 3. For p = ( dim (Θ) + 1) ∨ , where x ∨ y = max( x, y ) , under H and with φ fixed at φ ∗ , assume sup t ∈ R + X i =0 k sup θ ∈ Θ | ∂ iθ ( ∂ ψ λ g ( t ; θ, φ, ψ ) | ( θ,φ ∗ , ) |k p < ∞ . (20) Lemma 1. Condition 3 is satisfied for the exponential decay function model (forwhich dim (Θ) = 3 ) and stationary ergodic marks for which E φ ∗ [ | G ( X ) | ] < ∞ . ν ∗ = ( θ ∗ , φ ∗ , 0) and the maximum likelihood estimates are ˆ ν = (ˆ θ T , ˆ φ T , µ H ( φ ) = E φ [ H ( X )] and if evaluated at φ ∗ put µ H = µ H ( φ ∗ ) and if eval-uated at ˆ φ T put ˆ µ H . We also use the same notation for any consistent estimateof µ H such as ˆ µ H = ¯ H ( X ), the vector of sample means of components. We let G ( X ) = H ( X ) − µ H at the true value and ˆ G ( X ) = H ( X ) − ˆ µ H . Condition 4. The marks are from a stationary ergodic process with E φ ∗ [ | G ( X ) | ] < ∞ and ˆ µ H → P µ H as T → ∞ . Note that ˆ µ H → P µ H holds for either the sample mean estimate (using er-godicity of X ), the parametric form, ˆ µ H = µ H ( ˆ φ T ) (using consistency of themaximum likelihood estimates ˆ φ T under appropriate regularity conditions on f t ( x ; φ )) or any other consistent estimates of φ such as using method of mo-ments.We now state the main result. Theorem 1. Assume Conditions 1, 2, 3 and 4. Under H , the score statisticdefined in (13) with information matrix I ψ (ˆ ν T ) estimated by ˆ I ψ defined in (18) satisfies ˆ Q T d −→ χ ( r ) as T → ∞ , r = dim ( ψ ) . (21)The proof is given in Appendix B. We now investigate what happens to the distribution of the score statistic when H fails, that is when the mark process impacts the distribution of the jumptimes of the point process. We adopt the local power approach, which consistsin considering the sequence of local alternatives H T : ψ ∗ T = γ ∗ / √ T for someunkown γ ∗ . We therefore assume that the marks weakly impact the distributionof the jump times (with a magnitude of order 1 / √ T ), so that for a given T > H T . Following Proposition 1, we thus assume that we observea sequence of marked Hawkes processes N Tg , all defined on (and adapted to)the same probability space (Ω , F , P ). Note that we adopt the notation N Tg because, in contrast with the null hypothesis, the point process now depends on T . Moreover, we assume that all the marked Hawkes processes indexed by T are generated by the random measure ¯ N g on R × X , such that the normalizedboost function of N Tg is g ( ., φ ∗ , ψ ∗ T ), that is, for any t ∈ R + , N Tg admits thefollowing stochastic intensity: λ Tg ( t ; θ ∗ , φ ∗ , ψ ∗ T ) = η ∗ + ϑ ∗ Z ( −∞ ,t ) × X w ( t − s ; α ∗ ) g ( x ; φ ∗ , ψ ∗ T ) N Tg ( ds × d x ) , ν ∗ T = ( θ ∗ , φ ∗ , ψ ∗ T ). The expression of the scorestatistic is naturally adapted to ∂ ψ l Tg (ˆ ν T ) T I ψ (ˆ ν T ) − ∂ ψ l Tg (ˆ ν T ) , (22)where l Tg admits the same expression as in (11), replacing the pure Hawkesprocess N ( dt ) by the counting process N T ( dt ) = N Tg ( dt, X ). Similarly, in (22),ˆ ν T = (ˆ θ T , ˆ φ T , θ T is one maximizer of l Tg in the interior of Θ, and ˆ φ T is a consistent estimator of φ ∗ . As stated in Theorem 2 below, it turns outthat under H T , ˆ Q T tends to a non central chi-squared distribution, whose non-centrality parameter depends on γ ∗ and on the inverse of the Fisher informationmatrix (at point ψ = 0), Ω. In order to ensure the convergence of ˆ Q T , we makethe following assumptions. Condition 5. For p = ( dim (Θ) + 1) ∨ , we assume the existence of ǫ > suchthat, defining U = Θ × { φ ∗ } × B (0 , ǫ ) where B (0 , ǫ ) is the open ball of radius ǫ , sup T ∈ R + sup t ∈ [0 ,T ] 3 X i =0 E (cid:20) sup ν ∈U | ∂ iθ λ Tg ( t ; ν ) | p (cid:21) < + ∞ . Moreover, sup T ∈ R + sup t ∈ [0 ,T ] 2 X i =0 E (cid:20) sup ν ∈U | ∂ iθ ∂ ψ λ Tg ( t ; ν ) | p (cid:21) < + ∞ . Moreover, assume that there exists ǫ > such that E sup ψ ∈B (0 ,ǫ ) | ∂ ψ g ( x ; φ ∗ , ψ ) | p < + ∞ . (23) Finally, for q ∈ { , } , defining A = { α |∃ ( η, ϑ ) s.t. ( η, ϑ, α ) ∈ Θ } , we assumethe existence of ¯ w such that for any α ∈ A , for any t ≥ , w ( t ; α ) ≤ ¯ w ( t ) , and Z + ∞ ¯ w ( t ) q dt < ∞ . (24)Condition (24) is satisfied for the exponential decay function under the con-ditions stated above for α . For suitable choice of a compact parameter space forthe power law decay function, a two parameter family of decay functions, thecondition is also satisfied without placing undue restrictions on the parameterspace. Lemma 2. Condition 5 is satisfied for the exponential kernel case ( dim (Θ) = 3 )and for stationary marks satisfying (23).Proof. The proof follows exactly the same path as that of Lemma 1, replacingthe fourth order moment condition on G ( x ) by the local uniform condition(23). 17efore we state the main result of this section, we defineΩ = P − lim T → + ∞ T − I ψ ( ν ∗ ) , where we recall that I ψ ( ν ∗ ) was defined in (17) and corresponds to the Fisherinformation matrix associated to ψ , at point ψ = 0, under the null hypothesis.We prove that such a limit exists in Appendix B (Lemma 4). We can now statethe following theorem. Theorem 2. Assume Conditions 1,2,3,4 and 5. Under H T : ψ ∗ T = γ ∗ / √ T , wehave ˆ Q T → d χ (Ω / γ ∗ ) , where χ (Ω / γ ∗ ) ∼ k Z k with Z ∼ N (Ω / γ ∗ , . The proof is in Appendix C. In this paper we have derived the asymptotic distribution of the score testproposed for determining if marks have an impact on the intensity of a sin-gle Hawkes process. Quite general boost functions can be formulated in thissetting. We prove that the asymptotic distribution under the null hypothesisthat there is no impact of the proposed marks on the intensity process is theusual chi-squared distribution with degrees of freedom equal to the number ofparameters specified for the marks boost function. These asymptotic resultsrely heavily on the large sample results for quasi-likelihood estimation of multi-variate unmarked Hawked process considered in Clinet and Yoshida (2017). Inaddition to their assumptions on the null hypothesis model specification andparameters, because the score test involves functions of the marks, one addi-tional assumption (Condition 3) is required, and this is shown to hold in theexponential decay case (see Lemma 1).The marks process can be quite general and includes marks obtained fromobservations on a continuous time vector valued process in which there is serialdependence as well as dependence between components of the mark vector. Themain requirement is that the marks have finite fourth moment.For local power computations, we have also derived the non-central chi-squared limiting distribution for the score test statistic under a sequence oflocal alternatives with the boost parameter converging to the null hypothesisvalue at rate T − / .Establishing consistency of the score test requires a proof that the powertends to unity for any value of ψ = 0. However, establishing this rigorously re-quires proving the ergodicity of the point process along with substantial exten-sions to existing asymptotic theory for likelihood estimation in marked Hawkesprocesses. The main technical challenge for establishing this is showing thatthe asymptotic score w.r.t. φ is non-degenerate. Here a major difficulty arises18ecause the existence of multiple stationary values in the limiting likelihoodfunction of ( θ, φ ) when ψ = 0 cannot be ruled out easily.Crucial to establishing the conditions required for the results of Clinet and Yoshida(2017) as well as our additional Condition 3 is the Markovian nature of theHawkes intensity process with an exponential decay function. Extension to de-cay functions which are linear combinations of exponential kernels retain theMarkov property and so extension of above results should be straightforward.For other kernels, such as the power law decay function, the Markov prop-erty does not hold and hence extension of our results would require substan-tial and fundamental theory to extend known results in the literature firstlyin the unmarked Hawkes processes and secondly in the marked case. BecauseClinet and Yoshida (2017) also establish the required asymptotic theory of like-lihood estimation for a multivariate unmarked Hawkes process and because theform of the score statistic for a marked multivariate Hawkes process is of thesame basic form as for the univariate Hawkes process the results of this papershould readily extend to the multivariate case and could be the topic of futureresearch. A Proof of Lemma 1 For any c ∈ R r we denote the linear combinations G c ( X ) = c T G ( X ) and simi-larly for ˆ G c ( X ). We use the notation N g for the point process generated under H . This point process has event intensity identical to that of N defined in (4).Marks are observed at the event times of this process but do not impact theintensity of it. For the exponential decay specification, since dim (Θ) = 3, weneed to show Condition 3 for p = 4.sup t ∈ R + E [sup θ ∈ Θ | ∂ iθ { ϑ Z [0 ,t ) × X w ( t − s ; α ) G c ( x ) N g ( ds × d x ) }| p ] < ∞ for i = 0 , , 2. Notice that only the derivatives with respect to ϑ and α arerequired. These derivatives are linear combinations of terms of the form ϑ k Z [0 ,t ) × X ∂ iα w ( t − s ; α ) G c ( x ) N g ( ds × d x )for i = 0 , , k = 0 , 1, and with w ( t − s ; α ) = e − α ( t − s ) . Since Θ is bounded,we consider the integrals which are finite combinations of terms of the form Z [0 ,t ) × X ( t − s ) i e − α ( t − s ) G c ( x ) N g ( ds × d x ) ≤ Z [0 ,t ) × X ( t − s ) i e − α ( t − s ) G c ( x ) N g ( ds × d x )for i = 0 , , 2, and where 0 < α = inf { α |∃ ( η, ϑ ) , ( η, α, ϑ ) ∈ Θ } . Therefore, weneed to show to conclude the proof thatsup t ∈ R + E | Z [0 ,t ) × X ( t − s ) i e − α ( t − s ) G c ( x ) N g ( ds × d x ) | p < ∞ , i = 0 , , . p = 4. Let f i,t ( s ) = ( t − s ) i exp( − α ( t − s )). For any t ≥ 0, we have E [( Z [0 ,t ) × X f i,t ( s ) G c ( x ) N g ( ds × d x )) ] ≤ C E [( Z [0 ,t ) × X f i,t ( s ) | G c ( x ) | ˜ N g ( ds × d x )) ]+ C E [( Z [0 ,t ) × X f i,t ( s ) | G c ( x ) | F s ( d x ) λ ( s ; θ ∗ ) ds ) ]for some finite constant C and where the compensator of N g ( ds × d x ) is λ ( s ; θ ∗ ) F s ( d x ) ds where F s ( d x ) is the conditional distribution of y s with respect to F y s − .First define the probability measure µ ( ds ) = ( R t f i,u du ) − f i,s ds on [0 , t ],and apply Jensen’s inequality to the second term to get E [( Z [0 ,t ) × X f i,t ( s ) | G c ( x ) | F s ( d x ) λ ( s ; θ ∗ ) ds ) ]= ( Z [0 ,t ) f i,t ( s ) ds ) E [( Z [0 ,t ) × X | G c ( x ) | F s ( d x ) λ ( s ; θ ∗ ) µ ( ds )) ] ≤ ( Z [0 ,t ) f i,t ( s ) ds ) E [ Z [0 ,t ) f i,t ( s )( Z X | G c ( x ) | F s ( d x )) λ ( s ; θ ∗ ) ds ] ≤ ( Z [0 ,t ) f i,t ( s ) ds ) E [ Z [0 ,t ) f i,t ( s ) λ ( s ; θ ∗ ) E [ E [ | G c ( x ) ||F y s − ] ] ds ] ≤ C, Where we have used the independence of y and N g , the fact that E [ E [ | G c ( y s ) ||F y s − ] ]] < E [ | G c ( y s ) | ] < K for some constant K > 0, and sup t ∈ R + E [ R [0 ,t ) f i,t ( s ) λ ( s ; θ ∗ ) ds ] < ∞ by (Clinet and Yoshida, 2017, Lemma A.5). Consider now the first expectedvalue. Using Davis-Burkholder-Gundy inequality we have arguing similarly to(Clinet and Yoshida, 2017, Lemma A.2), for some constant C < ∞ not neces-sarily the same as above, E [( Z [0 ,t ) × X f i,t ( s ) | G c ( x ) | ˜ N g ( ds × d x )) ] ≤ C E [( Z [0 ,t ) × X f i,t ( s ) | G c ( x ) | N g ( ds × d x )) ] ≤ C E [( Z [0 ,t ) × X f i,t ( s ) | G c ( x ) | ˜ N g ( ds × d x )) ]+ 2 C E [( Z [0 ,t ) × X f i,t ( s ) | G c ( x ) | λ ( s ; θ ∗ ) F s ( d x ) ds ) ] . Similarly to the previous argument the second term is uniformly bounded be-cause sup t ∈ R + E [( Z [0 ,t ) f i,t ( s ) λ ( s ; θ ∗ ) ds ) ] < ∞ 20y (Clinet and Yoshida, 2017, Lemma A.5) and E [ E [ | G ( X ) |F y s − ] ] < E | G c ( y s ) | 0. For the first term we have E [( Z [0 ,t ) × X f i,t ( s ) | G c ( x ) | ˜ N g ( ds × d x )) ]= E [ Z [0 ,t ) × X f i,t ( s ) | G c ( x ) | λ ( s ; θ ∗ ) F s ( d x ) ds ]= E [ Z t f i,t ( s ) λ ( s ; θ ∗ ) E | G c ( y s ) | ds ] , where we have used the independence of y and N g . Now, E | G c ( y s ) | < ∞ and,once more by (Clinet and Yoshida, 2017, Lemma A.5) we havesup t ∈ R + E [ Z [0 ,t ) f i,t ( s ) λ ( s ; θ ∗ ) ds ] < ∞ which completes the proof. B Proof of Theorem 1 Define, for any fixed c , U ( t ; θ, φ ) = c T ∂ ψ λ g ( t ; θ, φ, 0) = ϑ Z [0 ,t ) × X w ( t − s ; α ) c T G ( x ; φ ) N g ( ds × d x ) (25)This notation is used repeatedly in the proof of the theorem as well as thelemmas used. The proof follows somewhat closely that of Clinet and Yoshida(2017). We first consider the normalized process corresponding to (16) and forany non zero vector of constants c ∈ R r define the process in u ∈ [0 , S Tu = 1 √ T Z [0 ,uT ] λ ( t ; θ ∗ ) − c T ∂ ψ λ g ( t ; ν ∗ ) ˜ N ( dt )= 1 √ T Z [0 ,uT ] λ ( t ; θ ∗ ) − U ( t ; θ ∗ , φ ∗ ) ˜ N ( dt ) . (26)Note that S T = √ T c T ∂ ψ l g ( ν ∗ ). Similarly to Clinet and Yoshida (2017), weestablish a functional CLT when T → ∞ .The proof of this theorem proceeds via several lemmas. Convergence through-out is with T → ∞ . The first lemma is concerned with the ergodic properties of U ( t ; θ, φ ) defined in (25) when φ = φ ∗ , is fixed at the true value in which casewe further abbreviate notation to U ( t ; θ ) = U ( t ; θ, φ ∗ ). Lemma 3. There exists a stationary Hawkes point process N ∞ on the originalprobability space (Ω , F , P ) , adapted to F t and defined on R such that: (i) N ∞ and y are independent. (ii) the stochastic intensity of N ∞ admits the represen-tation λ ∞ ( t ) = η + ϑ ∗ R ( −∞ ,t ) w ( t − s ; α ∗ ) N ∞ ( ds ) . Moreover, let us define N ∞ g s the marked point process which jumps at points of the form ( t ∞ i , y t ∞ i ) where t ∞ i are the jump times of N ∞ . Accordingly, we define U ∞ ( t ; θ ) = ϑ Z ( −∞ ,t ) × X w ( t − s ; α ) G c ( x ) N ∞ g ( ds × d x ) . Then, the joint process ( λ ∞ , U ∞ ( . ; θ ∗ )) is stationary ergodic. Finally we havethe convergence E | λ ( t, θ ∗ ) − λ ∞ ( t ) | + E | U ( t ; θ ∗ ) − U ∞ ( t ; θ ∗ ) | → , t → + ∞ . (27) Proof. The existence of N ∞ along with properties (i) and (ii) are direct conse-quences of the independence of N and y , along with Proposition 4.4 (i) fromClinet and Yoshida (2017). Next, since y is ergodic by assumption, the processof jumps N ∞ is stationary ergodic by assumption, and since both processes areindependent from each other, the joint process ( N ∞ , y ) is stationary ergodic aswell. Since for any t ∈ R , ( λ ∞ ( t ) , U ∞ ( t, θ ∗ )) admits a stationary representationand given the form of ( λ ∞ ( t ) , U ∞ ( t, θ ∗ )) t ∈ R , we can deduce that they are alsoergodic by Lemma 10.5 from Kallenberg (2006). Finally, we show (27). Wefirst deal with the convergence of f ( t ) := E | λ ( t, θ ∗ ) − λ ∞ ( t ) | to 0. Defining r ( t ) = E R ( −∞ , w ( t − s ; α ∗ ) λ ∞ ( s ) ds , and following the same reasoning as forthe proof of Proposition 4.4 (iii) in Clinet and Yoshida (2017), some algebraicmanipulations easily lead to the inequality f ( t ) ≤ r ( t ) + ϑ ∗ w ( . ; α ∗ ) ∗ f ( t ) , t ≥ a and b , and t ∈ R + , a ∗ b ( t ) = R t a ( t − s ) b ( s ) ds wheneverthe integral is well-defined. Iterating the above equation, we get for any n ∈ N f ( t ) ≤ n X k =0 ϑ ∗ k w ( . ; α ∗ ) ∗ k ∗ r ( t ) + ϑ ∗ n +1 w ( . ; α ∗ ) ∗ ( n +1) ∗ f ( t ) . Using the fact that R w ( . ; α ∗ ) = 1, ϑ ∗ < n tends to infinity,so that f is dominated by R ∗ r where R := P + ∞ k =0 ϑ ∗ k w ( . ; α ∗ ) ∗ k . Note that R is finite and integrable since R + ∞ R ( s ) ds ≤ / (1 − ϑ ). We first prove that r ( t ) → 0. To do so, note that r ( t ) = E [ λ ∞ (0)] R + ∞ t w ( u ; α ∗ ) du → w ( . ; α ∗ ) is integrable. Now, since R ∗ r ( t ) = R t R ( s ) r ( t − s ) ds , and R ( s ) r ( t − s ) is dominated by sup u ∈ R + r ( u ) R ( s ) which is integrable, we conclude by thedominated convergence theorem that f ( t ) ≤ R ∗ r ( t ) → 0. Finally, we provethat g ( t ) := E | U ( t ; θ ∗ ) − U ∞ ( t ; θ ∗ ) | → 0. Let N g ( ds × d x ) denote the point22rocess for s ∈ R and under the null hypothesis. We have g ( t ) ≤ E (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Z (0 ,t ) × X w ( t − s ; α ∗ ) G c ( x )( N g − N ∞ g )( ds × d x ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + E (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Z ( −∞ , × X w ( t − s ; α ∗ ) G c ( x ) N ∞ g ( ds × d x ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ E Z (0 ,t ) × X w ( t − s ; α ∗ ) | G c ( x ) || N g − N ∞ g | ( ds × d x )+ E Z ( −∞ , × X w ( t − s ; α ∗ ) | G c ( x ) | N ∞ g ( ds × d x ) ≤ E | G c ( x ) | Z (0 ,t ) w ( t − s ; α ∗ ) f ( s ) ds + E | G c ( x ) | E [ λ ∞ (0)] Z + ∞ t w ( u ; α ∗ ) du | {z } → . Since R (0 ,t ) w ( t − s ; α ∗ ) f ( s ) ds = R (0 ,t ) w ( s ; α ∗ ) f ( t − s ) ds and f ( t ) → 0, we have,again, by application of the dominated convergence theorem that g ( t ) → Lemma 4. S Tu defined in (26) satisfies ( S Tu ) u ∈ [0 , → d Ω / ( W u ) u ∈ [0 , where W is standard Brownian motion (and convergence is in the Skorokhodspace D ([0 , ) and Ω is a positive definite matrix.Proof. Similarly to (Clinet and Yoshida, 2017, proof of Lemma 3.13) we firstshow that h S T , S T i u = 1 T Z [0 ,uT ] λ ( t ; θ ∗ ) − U ( t ; θ ∗ , φ ∗ ) dt converges in probability to uc T Ω c . Introducing λ ∞ , U ∞ as in Lemma 3, weneed to show that1 T Z [0 ,uT ] (cid:8) λ ( t ; θ ∗ ) − U ( t ; θ ∗ , φ ∗ ) − λ ∞ ( t ) − U ∞ ( t ; θ ∗ ) (cid:9) dt → P . (28)Using the boundedness of λ ( t ; θ ∗ ) − and λ ∞ ( t ) − , we have the domination A t := (cid:12)(cid:12)(cid:12)(cid:12) U ( t ; θ ∗ , φ ∗ ) λ ( t ; θ ∗ ) − U ∞ ( t ; θ ∗ ) λ ∞ ( t ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ K (cid:12)(cid:12) U ( t ; θ ∗ , φ ∗ ) − U ∞ ( t ; θ ∗ ) (cid:12)(cid:12) + KU ∞ ( t ; θ ∗ ) | λ ( t ; θ ∗ ) − λ ∞ ( t ) | , for some constant K > 0. By Lemma 3, we thus have A t → P 0. Moreover, sinceby Condition 3, U ( t ; θ ∗ , φ ∗ ) and U ∞ ( t ; θ ∗ ) are L ǫ bounded for some ǫ > λ ( t ; θ ∗ ) and λ ∞ ( t ) are L p bounded for any p > 1, we deduce that E | A t | → E | T − R uT A t dt | → 0, and thus we get (28).By the ergodicity property of Lemma 3, we also have1 T Z [0 ,uT ] λ ∞ ( t ) − U ∞ ( t ; θ ∗ ) dt → P E (cid:2) λ ∞ (0) − U ∞ (0; θ ∗ ) (cid:3) = uc T Ω c, where Ω = E [ λ ∞ (0) − ∂ ψ λ ∞ (0) ∂ ψ λ ∞ (0) T ], which proves our claim.Next, for Lindeberg’s condition, for any a > 0, similarly to Clinet and Yoshida(2017) E [ X s ≤ u (∆ S Ts ) { ∆ S Ts >a } ] ≤ a E [ X s ≤ u (∆ S Ts ) ]= 1 a E [ Z [0 ,uT ] | √ T λ ( t ; θ ∗ ) − U ( t ; θ ∗ , φ ∗ ) | N ( dt )]= 1 a T E [ Z [0 ,uT ] λ ( t ; θ ∗ ) − | U ( t ; θ ∗ , φ ∗ ) | dt ] ≤ uKT sup t ∈ R + E | U ( t ; θ ∗ , φ ∗ ) | → , where we have used Condition 3 along with the boundedness of λ ( t ; θ ∗ ) − . Asin Clinet and Yoshida (2017), application of (Jacod and Shiryaev, 2013, 3.24chapter VIII) gives the required functional CLT. Lemma 5. √ T ( ∂ ψ l g (ˆ ν T ) − ∂ ψ l g ( ν ∗ )) → P . Proof. Rewrite1 √ T c T ( ∂ ψ l g (ˆ ν T ) − ∂ ψ l g ( ν ∗ ))= 1 √ T c T ( ∂ ψ l g (ˆ θ T , ˆ φ T , − ∂ ψ l g (ˆ θ T , φ ∗ , √ T c T ( ∂ ψ l g (ˆ θ T , φ ∗ , − ∂ ψ l g ( θ ∗ , φ ∗ , G c ( X ) − G c ( X ) = ˆ µ H − µ H , we have U ( t ; ˆ θ T , ˆ φ T ) − U ( t ; ˆ θ T , φ ∗ )= c T ∂ ψ λ g ( t ; ˆ θ T , ˆ φ T , − c T ∂ ψ λ g ( t ; ˆ θ T , φ ∗ , ϑ T Z [0 ,t ) × X w ( t − s ; ˆ α T )[ ˆ G c ( x ) − G c ( x )] N g ( ds × d x )= ˆ ϑ T Z [0 ,t ) w ( t − s ; ˆ α T ) N ( ds ) c T (ˆ µ H − µ H )24iving1 √ T c T ( ∂ ψ l g (ˆ θ T , ˆ φ T ) − ∂ ψ l g (ˆ θ T , φ ∗ ))= 1 √ T ˆ ϑ T Z [0 ,T ] λ ( t ; ˆ θ T ) − Z [0 ,t ) w ( t − s ; ˆ α T ) N ( ds ) { N ( dt ) − λ ( t ; ˆ θ T ) dt } c T (ˆ µ H − µ H )Now by Condition 4, ˆ µ H − µ H → P 0. Also, using the consistency of the quasilikelihood estimates for the unmarked process, ˆ ϑ → P ϑ ∗ . Finally1 √ T Z [0 ,T ] λ ( t ; ˆ θ T ) − Z [0 ,t ) w ( t − s ; ˆ α T ) N ( ds ) { N ( dt ) − λ ( t ; ˆ θ T ) dt } is precisely the same as the derivative of the nonboosted likelihood w.r.t. thebranching ratio parameter ϑ and it converges in distribution to a normal randomvariable directly from (Clinet and Yoshida, 2017, Proof of Theorem 3.11). Hencethe first term in (29) converges to zero in probability.Consider the second term in (29) which is written as1 √ T c T [ ∂ ψ l g (ˆ θ T , φ ∗ , − ∂ ψ l g ( θ ∗ , φ ∗ , √ T (Z [0 ,T ] λ ( t ; ˆ θ T ) − U ( t ; ˆ θ T ) N ( dt ) − Z [0 ,T ] U ( t ; ˆ θ T ) dt ) − √ T (Z [0 ,T ] λ ( t ; θ ∗ ) − U ( t ; θ ∗ ) N ( dt ) − Z [0 ,T ] U ( t ; θ ∗ ) dt ) = 1 T (Z [0 ,T ] ∂ θ ( λ ( t ; ¯ θ T ) − U ( t ; ¯ θ T )) N ( dt ) − Z [0 ,T ] ∂ θ U ( t ; ¯ θ T ) dt ) √ T (ˆ θ T − θ ∗ )using a first order Taylor series expansion where ¯ θ T ∈ [ θ ∗ , ˆ θ T ]. By the cen-tral limit theorem in Clinet and Yoshida (2017) √ T (ˆ θ T − θ ∗ ) is asymptoticallynormal. We show that the term multiplying this converges to zero in probabil-ity using a similar argument as to that in (Clinet and Yoshida, 2017, Proof ofLemma 3.12). Now, at any θ we have1 T (Z [0 ,T ] ∂ θ { λ ( t ; θ ) − U ( t ; θ )) } N ( dt ) − Z [0 ,T ] ∂ θ U ( t ; θ ) dt ) = 1 T Z [0 ,T ] ∂ θ { λ ( t ; θ ) − U ( t ; θ ) } ˜ N ( dt ) − T Z [0 ,T ] λ ( t, θ ) − ∂ θ λ ( t ; θ ) U ( t ; θ ) λ ( t ; θ ∗ ) dt − T Z [0 ,T ] ∂ θ U ( t ; θ ) λ ( t ; θ ) − [ λ ( t ; θ ) − λ ( t ; θ ∗ )] dt ∂ θ l T ( θ ) in (Clinet and Yoshida, 2017, middle p. 1809) and are listed in thesame order.The third term converges in probability to zero uniformly on a ball, V T cen-tred on θ ∗ shrinking to { θ ∗ } using similar arguments to those in (Clinet and Yoshida,2017, p. 1810) for their third term and Lemma 3.The second term also converges to a limit uniformly on a ball, V T centred on θ ∗ shrinking to { θ ∗ } and uses ergodicity from Lemma 3 and similar argumentsto Clinet and Yoshida (2017) but note that the limit is a matrix of zeros becauseits expectation is zero corresponding to the block diagonal structure of the fullinformation matrix.Finally consider the first, martingale term, M T ( θ ) = 1 T Z [0 ,T ] ∂ θ { λ ( t ; θ ) − U ( t ; θ ) } ˜ N ( dt )which we will show converges to zero in probability uniformly in θ ∈ Θ (uni-formity allow us to deal with the evaluation at ¯ θ T ) and use E [ (cid:12)(cid:12) M a,T (¯ θ T ) (cid:12)(cid:12) p ] ≤ E [sup θ ∈ Θ | M a,T ( θ ) | p ] where M a,T is the a ’th component. For p = dim (Θ) + 1 E [sup θ ∈ Θ | M a,T ( θ ) | p ] ≤ K (Θ , p ) (cid:26)Z Θ dθ E [ | M T ( θ ) | p ] + Z Θ dθ E [ | ∂ θ M T ( θ ) | p ] (cid:27) where K (Θ , p ) < ∞ using Sobolev’s inequality as in (Clinet and Yoshida, 2017,Proof of Lemma 3.10). We next apply the Davis-Burkholder-Gundy inequalityfollowed by Jensen’s inequality to each of E [ | M T ( θ ) | p ] and E [ | ∂ θ M T ( θ ) | p ].First E [ | M T ( θ ) | p ] ≤ CT − p E "Z [0 ,T ] ( ∂ θ { λ ( t ; θ ) − U ( t ; θ ) } ) λ ( t ; θ ∗ ) dt p ≤ CT − p + p − Z [0 ,T ] E h | ∂ θ { λ ( t ; θ ) − U ( t ; θ ) }| p λ ( t ; θ ∗ ) p i dt ≤ CT − p sup t ∈ R + E (cid:20) sup θ ∈ Θ | ∂ θ { λ ( t ; θ ) − U ( t ; θ ) }| p λ ( t ; θ ∗ ) p (cid:21) Similarly E [ | ∂ θ M T ( θ ) | p ] ≤ CT − p E "Z [0 ,T ] ( ∂ θ { λ ( t ; θ ) − U ( t ; θ ) } ) λ ( t ; θ ∗ ) dt p ≤ CT − p sup t ∈ R + E (cid:20) sup θ ∈ Θ | ∂ θ { λ ( t ; θ ) − U ( t ; θ ) }| p λ ( t ; θ ∗ ) p (cid:21) Now, as in Clinet and Yoshida (2017) proof of Lemma 3.12, the processes | ∂ θ { λ ( t ; θ ) − U ( t ; θ ) }| p λ ( t ; θ ∗ ) p and | ∂ θ { λ ( t ; θ ) − U ( t ; θ ) }| p λ ( t ; θ ∗ ) p are dominated by polynomials in λ ( t ; θ ) − , ∂ iθ λ ( t ; θ ) and ∂ iθ U ( t ; θ ) for i ∈ , , 2. The first two terms are covered by26linet and Yoshida (2017) condition A2 and shown to be true by them for theexponential decay Hawkes process. The terms ∂ iθ U ( t ; θ ) are covered by Condi-tion 3 which is shown to be true for the exponential decay model in Lemma1. Lemma 6. The estimated information matrix, ˆ I ψ defined in (18) satisfies T ˆ I ψ → P Ω . Proof. Recall from (18)ˆ I ψ = Z [0 ,T ] λ ( t ; ˆ θ T ) − ( ∂ ψ λ g ( t ; ˆ ν T )) ⊗ N ( dt )and let ˆ I ψ ( ν ∗ ) = Z [0 ,T ] λ ( t ; θ ∗ ) − ( ∂ ψ λ g ( t ; ν ∗ )) ⊗ N ( dt ) . Note that, by similar arguments to that of the proof of Lemma 3.12 in Clinet and Yoshida(2017), we have T − ˆ I ψ ( ν ∗ ) = T − Z [0 ,T ] λ ( t ; θ ∗ ) − ( ∂ ψ λ g ( t ; ν ∗ )) ⊗ dt + M T , where M T is a martingale of order O P ( T − / ). by ergodicity, we thus have that T − ˆ I ψ ( ν ∗ ) → Ω where Ω is the same positive definite matrix as in Lemma 4.Hence to prove Lemma 6 it is sufficient to show that T c T { ˆ I ψ − I ψ ( ν ∗ ) } c → c ∈ R r . Let R ( t ; θ, φ ) = λ ( t ; θ ) − U ( t ; θ, φ ). Then1 T c T { ˆ I ψ − ˆ I ψ ( ν ∗ ) } c = 1 T Z [0 ,T ] { R ( t ; ˆ θ T , ˆ φ T ) − R ( t ; θ ∗ , φ ∗ ) } N ( dt )= 1 T Z [0 ,T ] { R ( t ; ˆ θ T , ˆ φ T ) − R ( t ; ˆ θ T , φ ∗ ) } N ( dt )+ 1 T Z [0 ,T ] { R ( t ; ˆ θ T , φ ∗ ) − R ( t ; θ ∗ , φ ∗ ) } N ( dt )Now, using a Taylor series expansion1 T Z [0 ,T ] { R ( t ; ˆ θ T , ˆ φ T ) − R ( t ; ˆ θ T , φ ∗ ) } N ( dt )= 2 c T (ˆ µ H − µ H ) 1 T Z [0 ,T ] { λ ( t ; ˆ θ T ) − Z [0 ,t ) ˆ ϑ T w ( t − s ; ˆ α T ) N ( ds ) } R ( t ; ˆ θ T , φ ∗ ) N ( dt )+ { c T (ˆ µ H − µ H ) } T Z [0 ,T ] { λ ( t ; ˆ θ T ) − Z [0 ,t ) ˆ ϑ T w ( t − s ; ˆ α T ) N ( ds ) } N ( dt ) . Now c T (ˆ µ H − µ H ) → P T hence T c T { ˆ I ψ −I ψ ( ν ∗ ) } c converges to zero in probability, completing the proof.27 Proof of Theorem 2 We have divided the proof of Theorem 2 into a series of Lemmas. Before wederive the asymptotic distribution of the score statistic we need some definitions.For the sake of simplicity, we will use the notation λ T ( . ; θ ) := λ Tg ( . ; θ, φ, 0) (whichis independent of φ ∈ Φ). By Proposition 1, we may assume the existence of anunmarked Hawkes process N (0) generated by the same measure ¯ N g on R × X as the sequence of processes N Tg . N (0) is thus a marked Hawkes process withmark function g ( ., φ ∗ , 0) = 1. We call λ (0) its associated stochastic intensity,that is for any θ ∈ Θ λ (0) ( t ; θ ) = η + ϑ Z ( −∞ ,t ) × X w ( t − s ; α ) g ( x ; φ ∗ , N (0) ( ds × d x )= η + ϑ Z ( −∞ ,t ) w ( t − s ; α ) N (0) ( ds × X )where λ (0) ( t ; θ ∗ ) is the actual stochastic intensity of N (0) , that is R t λ (0) ( s ; θ ∗ ) ds is the predictable compensator of N (0) ([0 , t ] × X ). Finally, we define for i ∈{ , } , θ ∈ Θ, φ ∈ Φ, λ (0) ,i ( t ; θ, φ ) = ϑ Z [0 ,t ) × X w ( t − s ; α ) ∂ iψ g ( x ; φ, N (0) ( ds × d x ) . We first show that in the sense of (30) and (31) below, N Tg converges to N (0) when T → + ∞ . Lemma 7. Let f be a predictable process depending on θ ∈ Θ such that sup t ∈ [0 ,T ] E sup θ ∈ Θ | f ( t, θ ) | p < + ∞ for some p ≥ . Then we have E sup θ ∈ θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Z [0 ,T ) × X f ( t, θ ) { N Tg − N (0) } ( dt × d x ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = O (cid:16) T / (cid:17) (30) and for any i ∈ { , } sup t ∈ R + E sup θ ∈ θ | ∂ iψ λ Tg ( t ; θ, φ ∗ , − λ (0) ,i ( t ; θ, φ ∗ ) | = O (cid:16) T − / (cid:17) . (31) Proof. We prove our claim in three steps. Step 1. Letting δ T ( t ) = E | λ Tg ( t ; θ ∗ , φ ∗ , ψ ∗ T ) − λ (0) ( t ; θ ∗ ) | , we prove sup t ∈ [0 ,T ] δ T ( t ) =28 ( T − / ). We have δ T ( t ) ≤ ϑ E Z [0 ,t ) × X w ( t − s ; α ∗ ) g ( x ; φ ∗ , ψ ∗ T ) | N Tg − N (0) | ( ds × d x )+ ϑ E Z [0 ,t ) × X w ( t − s ; α ∗ ) | g ( x ; φ ∗ , ψ ∗ T ) − | N (0) ( ds × d x ) ≤ ϑ E Z [0 ,t ) w ( t − s ; α ∗ ) E [ g ( y s ; φ ∗ , ψ ∗ T ) |F y s − ] | λ Tg ( s ; θ ∗ , φ ∗ , ψ ∗ T ) − λ (0) ( s ; θ ∗ ) | ds + T − / | γ ∗ | ϑ E Z [0 ,t ) × X w ( t − s ; α ∗ ) | sup ψ ∈ [0 ,ψ ∗ T ] ∂ ψ g ( x ; φ ∗ , ψ ) | N (0) ( ds × d x ) ≤ ϑ E Z [0 ,t ) w ( t − s ; α ∗ ) E [ g ( y s ; φ ∗ , ψ ∗ T ) |F y s − ] | λ Tg ( s ; θ ∗ , φ ∗ , ψ ∗ T ) − λ (0) ( s ; θ ∗ ) | ds + T − / K ≤ Cϑ Z [0 ,t ) w ( t − s ; α ∗ ) δ T ( s ) ds + T − / K ≤ Cϑ sup s ∈ [0 ,T ] δ T ( s ) + T − / K, for some constant K > 0, where we have used that E [ g ( y s ; φ ∗ , ψ ∗ T ) |F y s − ] ≤ C < /ϑ , that R + ∞ w ( . ; α ∗ ) = 1, and Condition 5. Moreover, for a vector x , we haveused the notation | x | = P i | x i | . Taking the supremum over [0 , T ] on the lefthand side, we deduce sup s ∈ [0 ,T ] δ T ( s ) ≤ KT − / / (1 − Cϑ ) and we are done. Step 2. Letting ǫ T ( t ) = E | λ Tg ( t ; θ ∗ , φ ∗ , ψ ∗ T ) − λ (0) ( t ; θ ∗ ) | , we prove sup t ∈ [0 ,T ] ǫ T ( t ) = O ( T − / ). We have for some c > ǫ T ( t ) ≤ (1 + c ) ϑ E (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Z [0 ,t ) × X w ( t − s ; α ∗ ) g ( x ; φ ∗ , ψ ∗ T )( N Tg − N (0) )( ds × d x ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (1 + c − ) ϑ E (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Z [0 ,t ) × X w ( t − s ; α ∗ ) | g ( x ; φ ∗ , ψ ∗ T ) − | N (0) ( ds × d x ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , = I + II, where we have used the inequality ( x + y ) ≤ (1 + c ) x + (1 + c − ) y for any c > 0. First, we have I ≤ (1 + c )(1 + c − ) ϑ E (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Z [0 ,t ) × X w ( t − s ; α ∗ ) g ( x ; φ ∗ , ψ ∗ T )( ˜ N Tg − ˜ N (0) )( ds × d x ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (1 + c ) ϑ E (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Z [0 ,t ) w ( t − s ; α ∗ ) E [ g ( y s ; φ ∗ , ψ ∗ T ) |F y s − ]( λ Tg ( s ; θ ∗ , φ ∗ , ψ ∗ T ) − λ (0) ( s ; θ ∗ )) ds (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = I A + I B . Now, applying Jensen’s inequality with respect to the probability measure w ( s ; α ∗ ) ds/ R t w ( s ; α ∗ ) ds , and then using R + ∞ w ( . ; α ∗ ) = 1, and29 [ g ( y s ; φ ∗ , ψ ∗ T ) |F y s − ] ≤ C < /ϑ yields I B ≤ (1 + c ) ϑ C E Z [0 ,t ) w ( t − s ; α ∗ ) ǫ T ( s ) ds ≤ (1 + c ) ϑ C sup s ∈ [0 ,T ] ǫ T ( s ) . Now, for I A , we have I A ≤ K E (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Z [0 ,t ) × X w ( t − s ; α ∗ )( g ( x ; φ ∗ , ψ ∗ T ) − N Tg − ˜ N (0) )( ds × d x ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + K E (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Z [0 ,t ) × X w ( t − s ; α ∗ )( ˜ N Tg − ˜ N (0) )( ds × d x ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ K E Z [0 ,t ) w ( t − s ; α ∗ ) E [( g ( y s ; φ ∗ , ψ ∗ T ) − |F y s − ] | λ Tg ( s ; θ ∗ , φ ∗ , ψ ∗ T ) − λ (0) ( s ; θ ∗ ) | ds + K E Z [0 ,t ) w ( t − s ; α ∗ ) | λ Tg ( s ; θ ∗ , φ ∗ , ψ ∗ T ) − λ (0) ( s ; θ ∗ ) | ds ≤ KT Z [0 ,t ) w ( t − s ; α ∗ ) E [ sup ψ ∈ [0 ,ψ ∗ T ] ∂ ψ g ( y s ; φ ∗ , ψ ) ] / ǫ T ( s ) / ds + K sup s ∈ [0 ,T ] δ T ( s ) ≤ KT sup s ∈ [0 ,T ] (cid:0) ǫ T ( s ) ∨ (cid:1) + K sup s ∈ [0 ,T ] δ T ( s ) , where we have used Cauchy-Schwarz inequality along with (23) and (24). More-over, following a similar path as for Step 1, we also have that II ≤ KT − by(23). Thus, overall, using that sup s ∈ [0 ,T ] δ T ( s ) ≤ KT − / by Step 1, we obtainfor some constant K > ǫ T ( t ) ≤ K ( T − + T − / ) + ((1 + c ) ϑ C + KT − ) sup s ∈ [0 ,T ] ǫ T ( s ) , and taking the supremum over [0 , T ] on the left hand side, taking c > T large enough so that ((1 + c ) ϑ C + KT − ) < A for someconstant A < 1, we getsup s ∈ [0 ,T ] ǫ T ( s ) ≤ K ( T − + T − / ) / (1 − A ) ≤ ˜ KT − / for some ˜ K > Step 3. We prove (30) and (31). For (30), this is a direct consequence of thefact that the compensator of | N Tg − N (0) | is R T | λ Tg ( t ; θ ∗ , φ ∗ , ψ ∗ T ) − λ (0) ( t ; θ ∗ ) | dt ,30auchy-Schwarz inequality and the uniform condition on f . For (31), let i ∈{ , } . We have E sup θ ∈ Θ | ∂ iψ λ Tg ( t ; θ, φ ∗ , − λ (0) ,i ( t ; θ, φ ∗ ) | ≤ K E sup α ∈A (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Z [0 ,t ) × X w ( t − s ; α ) ∂ iψ g ( x ; φ ∗ , N Tg − N (0) )( ds × d x ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ K E (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Z [0 ,t ) × X ¯ w ( t − s ) | ∂ iψ g ( x ; φ ∗ , || N Tg − N (0) | ( ds × d x ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) And from here, using Burkholder-Davis-Gundy inequality, Step 2 of this proofalong with conditions (23) and (24) we deduce that the above term is dominatedby KT − / for some K > t ∈ R + . Lemma 8. (Consistency of ˆ ν T under the local alternatives) Under H T , we have ˆ ν T → P ν ∗ := ( θ ∗ , φ ∗ , . Proof. The convergence of the third component is obvious, and the convergenceof the second one is assumed. All we have to show is that ˆ θ T → P θ ∗ . Let l (0) T ( θ ) = R [0 ,T ) × X log λ ( t ; θ ) N (0) ( dt × d x ) − R T λ (0) ( t ; θ ) dt , where we recall that λ (0) is the stochastic intensity of N (0) . We need to show that uniformly in θ ∈ Θ, we have the convergence T − ( l T ( θ ) − l (0) T ( θ )) → P 0. But note that T − ( l T ( θ ) − l (0) T ( θ )) = I + II with I = − T − Z [0 ,T ) × X n log λ (0) ( t ; θ ) N (0) ( dt × d x ) − log λ T ( t ; θ ) N Tg ( dt × d x ) o and II = − T − Z T { λ (0) ( t ; θ ) − λ T ( t ; θ ) } dt. By (31), we immediately have that E sup θ ∈ Θ |{ λ (0) ( t ; θ ) − λ T ( t ; θ ) | = O ( T − / )uniformly in t ∈ [0 , T ], so that II → P θ ∈ Θ. Writing I as thesum T − Z [0 ,T ) × X n log λ (0) ( t ; θ ) − log λ T ( t ; θ ) o N (0) ( dt × d x )+ T − Z [0 ,T ) × X log λ T ( t ; θ ) n N (0) ( dt × d x ) − N Tg ( dt × d x ) o = A + B, we need to show that both terms tend to 0. Since | log λ (0) ( t ; θ ) − log λ T ( t ; θ ) | ≤ η − | λ (0) ( t ; θ ) − λ T ( t ; θ ) | , we easily get by Cauchy-Schwarz inequality and (31)that E sup θ ∈ Θ | A | → 0. Moreover, using log λ T ( t ; θ ) ≤ λ T ( t ; θ ) − 1, by (30) andCondition 5 we have that E sup θ ∈ Θ | B | → emma 9. Under H T , we have T − / ∂ ψ l g (ˆ ν T ) → d N (Ω γ ∗ , Ω) Proof. First, note that by application of Lemma 7, Lemma 8, and following thesame path as for the proof of Lemma 5, we deduce T − / ∂ ψ l g (ˆ ν T ) − T − / ∂ ψ l g ( ν ∗ ) → P . Next, We have T − / ∂ ψ l g ( θ ∗ , φ ∗ , 0) = T − / Z (0 ,T ) ∂ ψ log λ Tg ( t ; θ ∗ , φ ∗ , 0) ˜ N Tg ( dt )+ T − / Z (0 ,T ) ∂ ψ λ Tg ( t ; θ ∗ , φ ∗ , λ Tg ( t ; θ ∗ , φ ∗ , ψ ∗ T ) λ T ( t ; θ ∗ ) − ! dt = I + II, where we have used the notation ˜ N Tg ( dt ) = N Tg ( dt, X ) − λ Tg ( t ; θ ∗ , φ ∗ , ψ ∗ T ) dt . Wederive the limit of the first term following the same path as for the proof ofLemma 4. Letting S Tu = T − / R (0 ,uT ) ∂ ψ log λ Tg ( t ; θ ∗ , φ ∗ , 0) ˜ N Tg ( dt ), we directlyhave that h S T , S T i u = T − Z uT ∂ ψ λ Tg ( t ; θ ∗ , φ ∗ , ∂ ψ λ Tg ( t ; θ ∗ , φ ∗ , T λ T ( t ; θ ∗ ) λ Tg ( t ; θ ∗ , φ ∗ , ψ ∗ T ) dt. By (31), the boundedness of moments of λ Tg and its derivatives and H¨older’sinequality we easily deduce that h S T , S T i u = T − Z uT λ (0) , ( t ; θ ∗ , φ ∗ ) ∂ ψ λ (0) , ( t ; θ ∗ , φ ∗ ) T λ (0) ( t ; θ ∗ ) dt + o P (1) , which converges in probability to u Ω by Lemma 4. Similarly, Lindeberg’s con-dition E P s ≤ u (∆ S Ts ) {| ∆ S Ts | >a } → a > I = S T → d N (0 , Ω).Now we derive the limit for II . We have for some ˜ γ T ∈ [0 , ˆ γ T ] II = T − Z (0 ,T ) λ T ( t ; θ ∗ , φ ∗ ) − ∂ ψ λ Tg ( t ; θ ∗ , φ ∗ , ∂ ψ λ Tg ( t ; θ ∗ , φ ∗ , ˜ γ T ) T γ ∗ dt. Now, using H¨older’s inequality, the uniform boundedness of moments of λ Tg in ν , and (31), we deduce as previously that II = T − Z (0 ,T ) λ (0) ( t ; θ ∗ ) − λ (0) , ( t ; θ ∗ , φ ∗ ) λ (0) , ( t ; θ ∗ , φ ∗ ) T γ ∗ dt + o P (1) , which, by the proof of Lemma 6, tends in probability to the limit Ω γ ∗ . By Slut-sky’s Lemma, we get the desired convergence in distribution for T − / ∂ ψ l g (ˆ ν T ).32 emma 10. Under H T , we have T − ˆ I ψ → P Ω . Proof. First, as for Lemma 9, note that by application of Lemma 7, Lemma 8,and following the same path as for the proof of Lemma 6, we have T − (ˆ I ψ − ˆ I ψ ( ν ∗ )) → P . Now recall that T − ˆ I ψ ( ν ∗ ) = T − Z [0 ,T ] × X λ T ( t ; θ ∗ ) − ( ∂ ψ λ Tg ( t ; ν ∗ )) ⊗ N Tg ( dt × d x ) . By (30), (31), the boundedness of moments of λ Tg and its derivatives and H¨older’sinequality we get T − ˆ I ψ ( ν ∗ ) = T − Z [0 ,T ] × X λ (0) ( t ; θ ∗ ) − ( λ (0) , ( t ; ν ∗ )) ⊗ N (0) ( dt × d x ) + o P (1) , and by Lemma 6, the right-hand side converges in probability to Ω. Acknowledgements K-A Richards gratefully acknowledges PhD scholarship support by Boronia Cap-ital Pty. Ltd., Sydney, Australia. The research of S. Clinet is supported by aspecial grant from Keio University. W. T.M. Dunsmuir was supported by travelfunds from the Faculty of Sciences, University of New South Wales. References Andersen, P., O. Borgan, R. Gill, and N. Keiding (1996). Statistical ModelsBased on Counting Processes . Springer Series in Statistics. Springer NewYork.Bacry, E., I. Mastromatteo, and J.-F. Muzy (2015). Hawkes processes in finance. Market Microstructure and Liquidity 1 (01), 1550005.Br´emaud, P. and L. Massouli´e (1996, 07). Stability of nonlinear hawkes pro-cesses. Ann. Probab. 24 (3), 1563–1588.Breusch, T. S. and A. Pagan (1980). The lagrange multiplier test and its ap-plications to model specification in econometrics. Review of Economic Stud-ies 47 (1), 239–253.Chen, F. and P. Hall (2013). Inference for a nonstationary self-exciting pointprocess with an application in ultra-high frequency financial data modeling. Journal of Applied Probability 50 (4), 1006–1024.33linet, S. and Y. Potiron (2018). Statistical inference for the doubly stochasticself-exciting process. Bernoulli 24 (4B), 3469–3493.Clinet, S. and N. Yoshida (2017). Statistical inference for ergodic point pro-cesses and application to limit order book. Stochastic Processes and theirApplications 127 (6), 1800–1839.Crane, R. and D. Sornette (2008). Robust dynamic classes revealed by mea-suring the response function of a social system. Proceedings of the NationalAcademy of Sciences of the United States of America 105 (41), 15649–15653.Daley, D. and D. Vere-Jones (2002). An Introduction to the Theory of PointProcesses: Volume I: Elementary Theory and Methods . Probability and ItsApplications. Springer.Embrechts, P., T. Liniger, and L. Lin (2011, 08). Multivariate hawkes processes:an application to financial data. J. Appl. Probab. 48A , 367–378.Hawkes, A. (1971). Point spectra of some mutually exciting point processes. Journal of the Royal Statistical Society. Series B 33 (3), 438–443.Hawkes, A. G. (2018). Hawkes processes and their applications to finance: areview. Quantitative Finance 18 (2), 193–198.Jacod, J. and A. Shiryaev (2013). Limit theorems for stochastic processes , Vol-ume 288. Springer Science & Business Media.Kallenberg, O. (2006). Foundations of Modern Probability . Probability and ItsApplications. Springer New York.Laub, P. J., T. Taimre, and P. K. Pollett (2015). Hawkes processes. arXivpreprint arXiv:1507.02822 .Liniger, T. J. (2009). Multivariate hawkes processes . Ph. D. thesis, ETH Zurich.Ogata, Y. (1978). The asymptotic behaviour of maximum likelihood estimatorsfor stationary point processes. Annals of the Institute of Statistical Mathe-matics 30 (1), 243–261.Ogata, Y. (1988). Statistical models for earthquake occurrences and residualanalysis for point processes. Journal of the American Statistical Associa-tion 83 (401), 9–27.Ozaki, T. (1979). Maximum likelihood estimation of hawkes’ self-exciting pointprocesses.