[PDF] Marked point processes and intensity ratios for limit order book modeling

Abstract

This paper extends the analysis of Muni Toke and Yoshida (2020) to the case of marked point processes. We consider multiple marked point processes with intensities defined by three multiplicative components, namely a common baseline intensity, a state-dependent component specific to each process, and a state-dependent component specific to each mark within each process. We show that for specific mark distributions, this model is a combination of the ratio models defined in Muni Toke and Yoshida (2020). We prove convergence results for the quasi-maximum and quasi-Bayesian likelihood estimators of this model and provide numerical illustrations of the asymptotic variances. We use these ratio processes in order to model transactions occuring in a limit order book. Model flexibility allows us to investigate both state-dependency (emphasizing the role of imbalance and spread as significant signals) and clustering. Calibration, model selection and prediction results are reported for high-frequency trading data on multiple stocks traded on Euronext Paris. We show that the marked ratio model outperforms other intensity-based methods (such as "pure" Hawkes-based methods) in predicting the sign and aggressiveness of market orders on financial markets.

Full PDF

MMarked point processes and intensity ratiosfor limit order book modeling

Ioane Muni Toke ∗ and Nakahiro Yoshida † Universit´e Paris-Saclay, CentraleSup´elec, Math´ematiques et Informatique pour laComplexit´e et les Syst`emes, France. Graduate School of Mathematical Sciences, University of Tokyo, Japan.

CREST, Japan Science and Technology Agency, Japan.January 24, 2020

Abstract

This paper extends the analysis of Muni Toke & Yoshida (2020) to the case of marked pointprocesses. We consider multiple marked point processes with intensities deﬁned by three multi-plicative components, namely a common baseline intensity, a state-dependent component speciﬁcto each process, and a state-dependent component speciﬁc to each mark within each process.We show that for speciﬁc mark distributions, this model is a combination of the ratio modelsdeﬁned in Muni Toke & Yoshida (2020). We prove convergence results for the quasi-maximumand quasi-Bayesian likelihood estimators of this model and provide numerical illustrations ofthe asymptotic variances. We use these ratio processes in order to model transactions occuringin a limit order book. Model ﬂexibility allows us to investigate both state-dependency (em-phasizing the role of imbalance and spread as signiﬁcant signals) and clustering. Calibration,model selection and prediction results are reported for high-frequency trading data on multiplestocks traded on Euronext Paris. We show that the marked ratio model outperforms otherintensity-based methods (such as “pure” Hawkes-based methods) in predicting the sign andaggressiveness of market orders on ﬁnancial markets.

Keywords : marked point processes ; quasi-likelihood analysis ; limit order book ; high-frequencytrading data ; trade signature ; trade aggressiveness ∗ Bˆatiment Bouygues, 3 rue Joliot Curie, 91190 Gif-sur-Yvette, France. [email protected] † a r X i v : . [ q -f i n . T R ] J a n Introduction

The limit order book is the central structure that aggregates buy and sell intentions of all the marketparticipants on a given exchange. This structure typically evolves at a very high-frequency: on theParis Euronext stock exchange, the limit order book of a common stock is modiﬁed several hundredsof thousand times per day. Among these changes, thousands or tens of thousand events accountfor a transaction between two participants. The rest of the events indicate either the intention tobuy/sell at a limit price lower/higher than available, or the cancellation of such intentions (Abergel et al. , 2016).Empirical observation of high-frequency events on a limit order book may reveal irregular inter-val times (durations), clustering, intraday seasonality, etc. (Chakraborti et al. , 2011). Stochasticpoint processes are thus natural candidates for the modeling of such systems and their time series(Hautsch, 2011). In particular, Hawkes processes have been successfully suggested for the modelingof limit order book events (Bowsher, 2007; Large, 2007; Bacry et al. , 2012, 2013; Muni Toke &Pomponio, 2012; Lallouache & Challet, 2016; Lu & Abergel, 2018).One drawback of such models is the diﬃculty to account for high intraday variability. Anotherdrawback of such models is the lack of state-dependency: the observed state of the limit orderbook does not inﬂuence the dynamics of the events. One may try to include state-dependency byspecifying a fully parametric model (Muni Toke & Yoshida, 2017), which is a cumbersome solution.Another solution is to extend the Hawkes framework with marks (Rambaldi et al. , 2017) or withstate-dependent kernels (Morariu-Patrichi & Pakkanen, 2018). Muni Toke & Yoshida (2020) hasshown that state-dependency can be eﬃciently tackled by a multiplicative model with two compo-nents: a shared baseline intensity and a state-dependent process-speciﬁc component. An intensityratio model can then allow for eﬃcient estimation of state-dependency. Several microstructureexamples are worked out, including a ratio model for the prediction of the next trade sign .In this work, we extend the framework of Muni Toke & Yoshida (2020) to some cases of markedpoint processes, by adding a third term to the multiplicative deﬁnition of the intensity, whichaccounts for some mark distribution. We use this extension to deepen our investigation of limitorder book data. In ﬁnancial microstructure, one of the characteristics of an order sent to a ﬁnancialexchange is its aggressiveness (Biais et al. , 1995; Harris & Hasbrouck, 1996). We will say here thatan order is aggressive if it moves the price. A ratio model with marks can thus be used to analyseboth the side (bid or ask) and aggressiveness of market oders.The rest of the paper is organized as follows. In Section 2 we show that some marked modelscan be viewed as combinations of intensity ratios of non-marked processes. Section 3 deﬁnes thequasi-likelihood maximum and Bayesian estimators and proceeeds to the analysis of the estimation.Theorem 3.1 states the convergence result and a numerical illustration follows. We then turn to When characterizing a market order, we use indistinctly the terms side (bid/ask) or sign (-1,+1) to indicate if atransaction occurs at the best bid or best ask price of the limit order book.

Let I = { , , ..., ¯ i } . We consider certain marked point processes N i = ( N it ) t ∈ R + , i ∈ I and R + = [0 , ∞ ). For each i ∈ I , let ¯ k i be a positive integer, and let K i = { , , ..., ¯ k i } be a space ofmarks for the process N i . We denote by N i,k i = ( N i,k i t ) t ∈ R + the process counting events of type i with mark k i ∈ K i . We have obviously N i = (cid:80) k i ∈ K i N i,k i . Let ˇ I = ∪ i ∈ I (cid:0) { i } × K i (cid:1) . We assumethat the intensity of the process N i with mark k i , i.e., the intensity of N i,k i , is given by λ i,k i ( t, ϑ i , (cid:37) i ) = λ ( t ) exp (cid:18) (cid:88) j ∈ J ϑ ij X j ( t ) (cid:19) p k i i ( t, (cid:37) i )at time t for ( i, k i ) ∈ ˇ I , where ϑ i = ( ϑ ij ) j ∈ J ( i ∈ I ) and (cid:37) i ( i ∈ I ) are unknown parameters.More precisely, given a probability space (Ω , F , P ) equipped with a right-continuous ﬁltration F = ( F t ) t ∈ R + , λ = ( λ ( t )) t ∈ R + is a non-negative predictable process, X j = ( X j ( t )) t ∈ R + is apredictable process for each j ∈ J = { , ..., ¯ j } , and p k i i ( t, ρ i ) is a non-negative predictable processfor each ( i, k i ) ∈ ˇ I . Later we will put a condition so that the mapping t (cid:55)→ λ i,k i ( t, ϑ i , (cid:37) i ) is locallyintegrable with respect to dt , and we assume that N i,k i = 0, and for each ( i, k i ) ∈ ˇ I , the process N i,k i t − (cid:90) t λ i,k i ( s, ( ϑ i ) ∗ , ( (cid:37) i ) ∗ ) ds is a local martingale for a value (cid:0) ( ϑ i ) ∗ , ( (cid:37) i ) ∗ (cid:1) of the parameter (cid:0) ϑ i , (cid:37) i (cid:1) .In what follows, we consider the processes p k i i ( t, (cid:37) i ) such that (cid:88) k i ∈ K i p k i i ( t, (cid:37) i ) = 1 (2.1)for i ∈ I . Then the ¯ k i -dimensional process ( p k i i ( t, (cid:37) i )) k i ∈ K i gives the conditional distribution of theevent k i when the event i occurred. Under (2.1), the intensity process of N i becomes λ i ( t, ϑ i ) = (cid:88) k i ∈ K i λ i,k i ( t, ϑ i , (cid:37) i ) = λ ( t ) exp (cid:18) (cid:88) j ∈ J ϑ ij X j ( t ) (cid:19) . (2.2)The process λ is called a baseline intensity, whose structure will not be speciﬁed, in otherwords, λ will be treated as a nuisance parameter, diﬀerently from the use of Cox regression as3n Muni Toke & Yoshida (2017). The baseline intensity may represent the global market activityin ﬁnance, for example, and its irregular change may limit the reliability of estimation proceduresand predictions for any model ﬁtted to it. Muni Toke & Yoshida (2020) took an approach with anunstructured baseline intensity process and showed advantages of such modeling. Statistically, theprocess X ( t ) = ( X j ( t )) j ∈ J is an observable covariate process. Since the eﬀect of these covariate pro-cesses to the amplitude of λ i ( t, ϑ i ) is contaminated by the unobservable and structurally unknownbaseline intensity, a more interesting measure of dependency of λ i ( t, ϑ i ) to X ( t ) is the ratio λ i ( t, ϑ i ) / (cid:88) i (cid:48) ∈ I λ i (cid:48) ( t, ϑ i (cid:48) )for i ∈ I . Thus, we introduce the diﬀerence parameters θ ij = ϑ ij − ϑ j ( i ∈ I , j ∈ J ), ( θ j = 0 inparticular) and consider the ratios r i ( t, θ ) = exp (cid:18) (cid:80) j ∈ J ϑ ij X j ( t ) (cid:19)(cid:80) i (cid:48) ∈ I exp (cid:18) (cid:80) j ∈ J ϑ i (cid:48) j X j ( t ) (cid:19) = exp (cid:18) (cid:80) j ∈ J θ ij X j ( t ) (cid:19) (cid:80) i (cid:48) ∈ I exp (cid:18) (cid:80) j ∈ J θ i (cid:48) j X j ( t ) (cid:19) (2.3)for i ∈ I , where θ = ( θ ij ) i ∈ I ,j ∈ J with I = I \ { } = { , ..., ¯ i } .In this paper, we further assume that the factor p k i i ( t, (cid:37) i ) is given by p k i i ( t, (cid:37) i ) = exp (cid:18) (cid:80) j i ∈ J i (cid:37) i,k i j i Y ij i ( t ) (cid:19)(cid:80) k (cid:48) i ∈ K i exp (cid:18) (cid:80) j i ∈ J i (cid:37) i,k (cid:48) i j i Y ij i ( t ) (cid:19) for ( i, k i ) ∈ ˇ I , J i = { , ..., ¯ j i } . Obviously, p k i i ( t, (cid:37) i ) = q k i i ( t, ρ i ) deﬁned by q k i i ( t, ρ i ) = exp (cid:18) (cid:80) j i ∈ J i ρ i,k i j i Y ij i ( t ) (cid:19) (cid:80) k (cid:48) i ∈ K i, exp (cid:18) (cid:80) j i ∈ J i ρ i,k (cid:48) i j i Y ij i ( t ) (cid:19) (2.4)for ( i, k i ) ∈ ˇ I , where ρ i,k i j i = (cid:37) i,k i j i − (cid:37) i, j i ( k i ∈ K i , j ∈ J i , i ∈ I ), ρ i, j i = 0 in particular, and ρ i =( ρ i,k i j i ) k i ∈ K i, ,j i ∈ J i ( i ∈ I ) with K i, = K i \ { } = { , ..., ¯ k i } . The predictable processes ( Y ij i ( t )) t ∈ R + ( i ∈ I , j i ∈ J i ) are observable covariate processes, J i being a ﬁnite index set. This is a multinomiallogistic regression model.Let Θ be a bounded open convex set in R p with p = ¯ i ¯ j . For each i ∈ I , R i denotes a boundedopen convex set in R p i with p i = ¯ j i ¯ k i . Write ρ = ( ρ i ) i ∈ I . Let R = Π i ∈ I R i . We will consider Θ × R as the parameter space of ( θ, ρ ). 4 emark 1. The marked ratio model λ i,k i ( t, ϑ i , (cid:37) i ) = λ ( t ) exp (cid:18) (cid:88) j ∈ J ϑ ij X j ( t ) (cid:19) exp (cid:18) (cid:80) j i ∈ J i (cid:37) i,k i j i Y ij i ( t ) (cid:19)(cid:80) k (cid:48) i ∈ K i exp (cid:18) (cid:80) j i ∈ J i (cid:37) i,k (cid:48) i j i Y ij i ( t ) (cid:19) is in general not equivalent to a non-marked ratio model in larger dimension, in which we wouldwrite the intensity of the counting process of events of type i ∈ I with mark k i ∈ K i as λ i,k i ( t, ϑ i,k i ) = ˜ λ ( t ) exp (cid:18) (cid:88) j ∈ ˜ J ϑ i,k i j Z j ( t ) (cid:19) . for some covariate processes Z j , j ∈ ˜ J . Equivalence of the models would require these expressionsto coincide for some sets of covariates and parameters. However, if Z j ( t ) = 0 for all j ∈ ˜ J , thennecessarily X j ( t ) = 0 for all j ∈ J and Y ij i ( t ) = 0 for all i ∈ I and j i ∈ J i . This in turn implies | K i | = ˜ λ ( t ) λ ( t ) for all i ∈ I , which is generally not true. In Section 4.5, a non-marked ratio model is usedas a benchmark to assess the performances of the marked ratio model. Prediction performancesare indeed shown to be diﬀerent. The two step marked ratio model consists of the two kinds of ratio models (2.3) and (2.4). Esti-mation of this model can be carried out with multiple successive ratio models.In the ﬁrst step, we consider the parameter θ = ( θ ij ) i ∈ I ,j ∈ J and the ratios (2.3) for i ∈ I . Thequasi-log-likelihood based on observations on [0 , T ] for this ratio model is H T ( θ ) = (cid:88) i ∈ I (cid:90) T log r i ( t, θ ) dN it . (3.1)A quasi-maximum likelihood estimator (QMLE) for θ is a measurable mapping ˆ θ MT : Ω → Θsatisfying H T (ˆ θ MT ) = max θ ∈ Θ H T ( θ )for all ω ∈ Ω. Originally, ˆ θ MT is deﬁned on a sample space S T expressing all the possible outcomes of ( λ ( t ) , X j ( t ) , Y ij i ( t ); t ∈ [0 , T ] , i ∈ I , j ∈ J , j i ∈ J i ). If (Ω , F , P ) is an abstract space used for deﬁning the true probability measure P ∗ T on S T by some random variable V T : Ω → S T (i.e. P ∗ T = P V − T ), then treating ˆ θ MT as a function on Ω conﬂicts with

5n the second step, we consider the ratios (2.4) and the associated quasi-log-likelihood H ( i ) T ( ρ i ) = (cid:88) k i ∈ K i (cid:90) T log q k i i ( t, ρ i ) dN i,k i t . (3.2)for i ∈ I . Then a measurable mapping ˆ ρ i,MT : Ω → R i is called a quasi-maximum likelihoodestimator (QMLE) for ρ i if H ( i ) T ( ˆ ρ i,MT ) = max ρ i ∈R i H ( i ) T ( ρ i ) . It is possible to pool these estimating functions by the single estimating function H T ( θ, ρ ) = H T ( θ ) + (cid:88) i ∈ I H ( i ) T ( ρ i ) . (3.3)In other words, H T ( θ, ρ ) = (cid:88) i ∈ I (cid:88) k i ∈ K i (cid:90) T log (cid:0) r i ( t, θ ) q k i i ( t, ρ i ) (cid:1) dN i,k i t (3.4)The collection of QMLEs (cid:0) ˆ θ BT , ( ˆ ρ i,MT ) i ∈ I (cid:1) is a QMLE for H T ( θ, ρ ). Use of H T ( θ, ρ ) is convenientwhen we consider asymptotic distribution of the estimators ˆ θ MT and ˆ ρ iT ( i ∈ I ) jointly.The quasi-Bayesian estimator (QBE) (cid:0) ˆ θ BT , ( ˆ ρ i,BT ) i ∈ I (cid:1) is deﬁned byˆ θ BT = (cid:20) (cid:90) Θ ×R exp (cid:0) H T ( θ, ρ ) (cid:1) (cid:36) ( θ, ρ ) dθdρ (cid:21) − (cid:90) Θ ×R θ exp (cid:0) H T ( θ, ρ ) (cid:1) (cid:36) ( θ, ρ ) dθdρ (3.5)and ˆ ρ i,BT = (cid:20) (cid:90) Θ ×R exp (cid:0) H T ( θ, ρ ) (cid:1) (cid:36) ( θ, ρ ) dθdρ (cid:21) − (cid:90) Θ ×R ρ i exp (cid:0) H T ( θ, ρ ) (cid:1) (cid:36) ( θ, ρ ) dθdρ (3.6)for a prior probability density (cid:36) ( θ, ρ ) on Θ × R . We assume that (cid:36) : Θ × R → R + is continuousand 0 < inf ( θ,ρ ) ∈ Θ ×R (cid:36) ( θ, ρ ) ≤ sup ( θ,ρ ) ∈ Θ ×R (cid:36) ( θ, ρ ) < ∞ . (3.7)Since H T ( θ ) and H ( i ) T ( ρ i ) have no common parameters, the maximization of H T ( θ, ρ ) with respect the deﬁnition of ˆ θ MT . However, what we want to investigate is concerning the distribution of ˆ θ MT (deﬁned on S T )under P ∗ T , and then we can pull back ˆ θ MT on S T to Ω by V T if P ∗ T = P V − T . For this reason, we can identify ˆ θ MT withˆ θ MT ◦ V T , and may regard ˆ θ MT as deﬁned on Ω. This remark makes sense especially when one treats a weak solutionof a stochastic diﬀerential equation for a covariate.

6o the parameters θ and ρ i ( i ∈ I ) can be carried out separately. However, these components arenot always individually treated for the QBE. If (cid:36) ( θ, ρ ) is a product of prior densities as (cid:36) ( θ, ρ ) = (cid:36) (cid:48) ( θ )Π i ∈ I ϕ i ( ρ i ), then the each integral in (3.5) and (3.6) is simpliﬁed and we can compute ˆ θ BT andˆ ρ i,BT ( i ∈ I ) separately:ˆ θ BT = (cid:20) (cid:90) Θ exp (cid:0) H T ( θ ) (cid:1) (cid:36) (cid:48) ( θ ) dθ (cid:21) − (cid:90) Θ θ exp (cid:0) H T ( θ ) (cid:1) (cid:36) (cid:48) ( θ ) dθ and ˆ ρ i,BT = (cid:20) (cid:90) R i exp (cid:0) H ( i ) T ( ρ i ) (cid:1) (cid:36) i ( ρ i ) dρ i (cid:21) − (cid:90) R i ρ i exp (cid:0) H ( i ) T ( ρ i ) (cid:1) (cid:36) i ( ρ i ) dρ i for i ∈ I . Let X ( t ) = ( X j ( t )) j ∈ J and let Y i ( t ) = (cid:0) Y ij i ( t ) (cid:1) j i ∈ J i for i ∈ I . We consider the following conditions. [M1] The process (cid:0) λ ( t ) , X ( t ) , Y ( t )) is a stationary process and the random variables λ (0), exp( | X j (0) | )and exp( | Y ij i (0) | ) are in L ∞ – = ∩ p> L p for j ∈ J , j i ∈ J i and i ∈ I .The alpha mixing coeﬃcient α ( h ) is deﬁned by α ( h ) = sup t ∈ R + sup A ∈B [0 ,t ] B ∈B [ t + h, ∞ ) (cid:12)(cid:12) P [ A ∩ B ] − P [ A ] P [ B ] (cid:12)(cid:12) , where for I ⊂ R + , B I denotes the σ -ﬁeld generated by (cid:0) λ ( t ) , ( X j ( t )) j ∈ J , ( Y i,k i j i ( t )) i ∈ I ,j i ∈ J i ,k i ∈ K i (cid:1) . [M2] The alpha mixing coeﬃcient α ( h ) is rapidly decreasing in that α ( h ) h L → h → ∞ forevery L > i, k i ) is selected with two-fold multinomial distri-butions of sample size equal to 1. First the class i ∈ I is selected when ξ i = 1 for some randomvariable ξ = ( ξ , ..., ξ ¯ i ) ∼ Multinomial(1; π , ...., π ¯ i ) . If ξ i = 1 for a class i ∈ I , then the class k i ∈ K i is chosen as k i = k when η ik = 1 for someindependent random variable η i = ( η i , ..., η i ¯ k i ) ∼ Multinomial(1; π (cid:48) , ..., π (cid:48) ¯ k i ) . V ( x, θ ) the variance matrix of the (1 + i )-dimensional multinomial distributionM(1; π , π , ..., π i ) with π i = ˙ r i ( x, θ ), i ∈ I , where˙ r i ( x, θ ) = exp (cid:18) (cid:80) j ∈ J ϑ ij x j (cid:19)(cid:80) i (cid:48) ∈ I exp (cid:18) (cid:80) j ∈ J ϑ i (cid:48) j x j (cid:19) = exp (cid:18) (cid:80) j ∈ J θ ij x j (cid:19) (cid:80) i (cid:48) ∈ I exp (cid:18) (cid:80) j ∈ J θ i (cid:48) j x j (cid:19) , x = ( x j ) j ∈ J Denote by V i ( x, ρ i ) the variance matrix of the (1+ k i )-dimensional multinomial distribution M(1; π (cid:48) , π (cid:48) , ..., π (cid:48) k i )with π (cid:48) k i = ˙ q k i i ( y i , ρ i ), k i ∈ K i , where˙ q k i i ( y i , ρ i ) = exp (cid:18) (cid:80) j i ∈ J i (cid:37) i,k i j i y ij i (cid:19)(cid:80) k (cid:48) i ∈ K i exp (cid:18) (cid:80) j i ∈ J i (cid:37) i,k (cid:48) i j i y ij i (cid:19) = exp (cid:18) (cid:80) j i ∈ J i ρ i,k i j i y ij i (cid:19) (cid:80) k (cid:48) i ∈ K i, exp (cid:18) (cid:80) j i ∈ J i ρ i,k (cid:48) i j i y ij i (cid:19) , y i = ( y ij i ) ∈ R ¯ j i ( i ∈ I ) . Let us introduce some notations used in the following analysis. For a tensor T = ( T i ,...,i k ) i ,...,i k ,we write T [ u , ..., u k ] = T [ u ⊗ · · · ⊗ u k ] = (cid:88) i ,...,i k T i ,...,i k u i · · · u i k k (3.8)for u = ( u i ) i ,..., u k = ( u i k k ) i k . Brackets [ , ..., ] stand for a multilinear mapping. We denote by u ⊗ r = u ⊗ · · · ⊗ u the r times tensor product of u .Let Γ T ( θ, ρ ) = − T − ∂ θ,ρ ) H T ( θ, ρ )and let Γ T = Γ T ( θ ∗ , ρ ∗ ). Then, as detailed on p. 26,Γ T ( θ, ρ ) = diag (cid:2) Γ T ( θ ) , Γ T ( ρ ) , ..., Γ ¯ iT ( ρ ¯ i ) (cid:3) where Γ T ( θ )[ u ⊗ ] = 1 T (cid:90) T (cid:18) V ( X ( t ) , θ ) ⊗ X ( t ) ⊗ (cid:19) [ u ⊗ ] (cid:88) i ∈ I dN it ( u ∈ R p ) (3.9)8ith V ( x, θ ) = ( V ( x, θ ) i,i (cid:48) ) i,i (cid:48) ∈ I , andΓ iT ( ρ i )[( u i ) ⊗ ] = 1 T (cid:90) T (cid:18) V i ( Y i ( t ) , ρ i ) ⊗ Y i ( t ) ⊗ (cid:19) [( u i ) ⊗ ] dN it ( u i ∈ R p i )with V i ( y i , ρ i ) = ( V i ( y i , ρ i ) k i ,k (cid:48) i ) k i ,k (cid:48) i ∈ K i, .Let Λ( w, x ) = w (cid:88) i ∈ I exp (cid:0) x (cid:2) ϑ ∗ i (cid:3)(cid:1) (3.10)for w ∈ R + and x ∈ R j .We have V ( x, θ ) i,i (cid:48) = 1 { i = i (cid:48) } ˙ r i ( x, θ ) − ˙ r i ( x, θ ) ˙ r i (cid:48) ( x, θ )Therefore V ( X ( t ) , θ ) i,i (cid:48) = 1 { i = i (cid:48) } r i ( t, θ ) − r i ( t, θ ) r i (cid:48) ( t, θ ) (3.11)and V ( X ( t ) , θ ) i,i (cid:48) = V ( X ( t ) , θ ) i,i (cid:48) for i, i (cid:48) ∈ I . Write V ( x ) = V ( x, θ ∗ ).We have V i ( y i , ρ i ) k i ,k (cid:48) i = 1 { k i = k (cid:48) i } ˙ q k i ( y i , ρ i ) − ˙ q k i ( y i , ρ i ) ˙ q k (cid:48) i ( y i , ρ i ) . Hence V i ( Y i ( t ) , ρ i ) k i ,k (cid:48) i = 1 { k i = k (cid:48) i } q k i i ( t, ρ i ) − q k i i ( t, ρ i ) q k (cid:48) i i ( t, ρ i ) (3.12)and V i ( Y i ( t ) , ρ i ) k i ,k (cid:48) i = V i ( Y i ( t ) , ρ i ) k i ,k (cid:48) i for k i , k (cid:48) i ∈ K i, . We denote V i ( y i ) = V i ( y i , ( ρ i ) ∗ ).Let Γ( θ )[ u ⊗ ] = E (cid:20)(cid:18) V ( X (0) , θ ) ⊗ X (0) ⊗ (cid:19) [ u ⊗ ]Λ( λ (0) , X (0)) (cid:21) for u ∈ R p , and letΓ i ( ρ i )[( u i ) ⊗ ] = E (cid:20)(cid:18) V i ( Y i (0) , ρ i ) ⊗ Y i (0) ⊗ (cid:19) [( u i ) ⊗ ]Λ( λ (0) , X (0)) r i (0 , θ ∗ ) (cid:21) for u i ∈ R p i , i ∈ I . Let ˇ p = p + (cid:80) i ∈ I p i = ¯ i ¯ j + (cid:80) i ∈ I ¯ k i ¯ j i . The full information matrix is the ˇ p × ˇ p block diagonal matrix is Γ( θ, ρ ) = diag (cid:2) Γ( θ ) , Γ ( ρ ) , Γ ( ρ ) , ..., Γ ¯ i ( ρ ¯ i ) (cid:3) , θ ∗ , ρ ∗ ) . (3.13)An identiﬁability condition will be imposed. [M3] inf θ ∈ Θ inf u ∈ R p : | u | =1 Γ( θ )[ u ⊗ ] > ρ i ∈R i inf u ∈ R p i : | u i | =1 Γ i ( ρ i )[( u i ) ⊗ ] > i ∈ I .For the QMLE ˆ ψ MT = (ˆ θ MT , ˆ ρ MT ) and the QBE ˆ ψ BT = (ˆ θ BT , ˆ ρ BT ) of ψ = ( θ, ρ ) = ( θ, ρ , ..., ρ ¯ i ), letˆ u A T = T / (cid:0) ˆ ψ A − ψ ∗ ) ( A ∈ { M, B } ) . Theorem 3.1.

Suppose that Conditions [ M , [ M and [ M are satisﬁed. Then E [ f (ˆ u A T )] → E [ f (Γ − / ζ )] as T → ∞ for A ∈ { M, B } and every f ∈ C ( R ˇ p ) at most polynomial growth, where ζ is a ˇ p -dimensional standard Gaussian random vector. Example 1.

As an illustration we consider the case with two processes ( I = { , } ), and twomarks for each process ( K = K = { , } ). The ﬁrst state-dependent term takes into account onecovariate X (i.e. J = { } ). The mark distributions both depend on another covariate Y (i.e. J = J = { } ). In this example, we assume that X and Y are independent Markov chains withvalues in {− , } and constant transition intensities λ X and λ Y . We assume that λ is the intensityof a Hawkes process ( H t ) t ≥ with a single exponential kernel, i.e. λ ( t ) = µ + (cid:82) t αe − β ( t − s ) dH s , with( α, β ) ∈ ( R ∗ + ) , αβ < θ , ρ , , ρ , ) deﬁned as θ = ϑ − ϑ and ρ i, = (cid:37) i, − (cid:37) i, , i = 0 ,

1. In this speciﬁc case the matrix Γ of Equation (3.13) is a 3 × , = µ − αβ e θ e θ (cid:0) cosh ϑ + cosh ϑ (cid:1) , Γ , = µ − αβ e ρ , e ρ , e θ / e θ (cid:18) cosh ϑ + ϑ ϑ − ϑ (cid:19) , Γ , = µ − αβ e ρ , e ρ , e θ / e θ (cid:18) cosh ϑ + ϑ ϑ − ϑ (cid:19) . We run 1000 simulations of the processes ( N , N ) with their marks for various values of horizon T . Numerical values used in these simulations are the following: µ = 0 . α = 1 . β = 2 . λ X = λ Y = 0 . ϑ = − . ϑ = 0 . (cid:37) , = − . (cid:37) , = 0 . (cid:37) , = − . (cid:37) , = 1 . θ , ˆ ρ , , ˆ ρ , ) with the10wo-step ratios described above. Table 1 gives the mean estimators and the true values of theparameters, as well as the empirical standard deviation, compared to the theoretical values T − Γ − i,i , i = 0 , , T . For completeness, Figure 1 also plots θ ρ , ρ , T True Value 1.500 1.000 2.000

Estimator mean 1.817 1.576 5.14610 Estimator sd T − Γ − i,i Estimator Mean 1.541 1.044 2.40230 Estimator sd T − Γ − i,i Estimator Mean 1.508 0.999 2.011100 Estimator sd T − Γ − i,i Estimator Mean 1.502 1.005 2.013300 Estimator sd T − Γ − i,i Estimator Mean 1.501 1.000 2.0091000 Estimator sd T − Γ − i,i Estimator Mean 1.498 1.001 1.9993000 Estimator sd T − Γ − i,i Table 1: Numerical results for the estimation of the model of Example 1.the empirical standard deviations of the three estimators and the theoretical standard deviation T − Γ − i,i , i = 0 , , T . Asymptotic values predictedby Theorem 3.1 are indeed empirically retrieved, which ends this numerical illustration. We consider the market orders submitted to a given limit order book. Let N be the processcounting the market orders submitted on the bid side (sell market orders) and N the processcounting the market orders submitted on the ask side (buy market orders). On each side, wefurther consider whether the order is an aggressive order that moves the price (labeled with mark1), or a non-aggressive order that does not move the price (labeled with mark 0).11 l l l l l

10 50 200 1000 − − + q Horizon T S t anda r d de v i a t i on l Empirical sdTheoretical sd l l l l l l

10 50 200 1000 − − + r Horizon T l Empirical sdTheoretical sd l l l l l l

10 50 200 1000 − − + r Horizon T l Empirical sdTheoretical sd

Figure 1: Empirical and theoretical standard deviation of the quasi-maximum likelihood estimatorsˆ θ (left), ˆ ρ , (center) and ˆ ρ , ) (right).We assume that the intensity of an order of type i ∈ I = { , } with mark k i ∈ K = K = K = { , } is λ i,k i ( t, ϑ i , (cid:37) i ) = λ ( t ) exp (cid:88) j ∈ J ϑ ij X j ( t )  exp (cid:16)(cid:80) j ∈ J i (cid:37) i,k i j i Y ij ( t ) (cid:17)(cid:80) k (cid:48) i ∈ K i exp (cid:16)(cid:80) j ∈ J i (cid:37) i,k (cid:48) i j Y ij ( t ) (cid:17) . (4.1)In the following applications, we will consider several possible models deﬁned with various setsof covariates X j , j ∈ J and Y ij , j ∈ J i , i = 0 ,

1. The tested sets of covariates X j , j ∈ J and Y ij , j ∈ J i , i = 0 , Z = 1 commonto all models): • Z ( t ) = q B ( t ) − q A ( t ) q B ( t ) − q A ( t ) where q B ( t ) (resp. q A ( t )) is the quantity available at the best bid (resp.ask)at time t (i.e. the imbalance); • Z ( t ) = (cid:15) ( t ), where (cid:15) ( t ) is the sign of the last market order at time t (1 for an ask marketorder, − • Z ( t ) = σ ( t ) (cid:15) ( t ), where σ ( t ) is equal to 1 if the spread at time t is large (larger than a referencevalue, taken here to be the median spread of the stock), − • Z : H , ( t ) = log (cid:16) µ , + (cid:82) t α , e − β , ( t − s ) dN , s (cid:17) (Hawkes covariate for aggressive bid marketorders) • Z : H , ( t ) = log (cid:16) µ , + (cid:82) t α , e − β , ( t − s ) dN , s (cid:17) (Hawkes covariate for non-aggressive bidmarket orders) • Z : H , ( t ) = log (cid:16) µ , + (cid:82) t α , e − β , ( t − s ) dN , s (cid:17) (Hawkes covariate for aggressive ask marketorders) • Z : H , ( t ) = log (cid:16) µ , + (cid:82) t α , e − β , ( t − s ) dN , s (cid:17) (Hawkes covariate for non-aggressive askmarket orders) 12 Z : H ( t ) = log (cid:16) µ + (cid:82) t α e − β ( t − s ) dN s (cid:17) (Hawkes covariate for bid market orders) • Z : H ( t ) = log (cid:16) µ + (cid:82) t α e − β ( t − s ) dN s (cid:17) (Hawkes covariate for ask market orders)With these Hawkes covariates, the ratio model can actually be seen as a kind of non-linear Hawkesprocess. We use tick-by-tick data for 36 stocks traded on Euronext Paris. The sample spans the whole year2015, i.e. roughly 200 trading days for each stock, although some days are missing for some stocks.Table 3 in Appendix C lists the stocks investigated and the number of trading days available. Roughdata consists in a TRTH (Thomson-Reuters Tick History) databases: for each trading day and eachstock, one ﬁle lists the transactions (quantities and prices) and one ﬁle lists the modiﬁcations of thelimit order book (level, price and quantities). Timestamps are given with a millisecond precision.Synchronization of both ﬁles and reconstruction of the limit order book is carried out with theprocedure described in Muni Toke (2016). One strong advantage of the ratio model is that itdoes not require precise timestamps in itself, since timestamps do not appear explicitly in thequasi-likelihood of the ratios, while ﬁtting other intensity-based models (e.g. Hawkes processes)requires unique precise timestamps for log-likelihood computation. Here, if Hawkes ﬁts are usedas covariates (covariates Z to Z in our application), then we choose to consider only uniquetimestamps, i.e. we aggregate orders of the same type occurring at the same timestamp. Following Sections 2 and 3, estimation of the model deﬁned at Equation (4.1) can be carried outwith multiple successive ratio models. In the ﬁrst step, we consider the diﬀerence parameters θ ij = ϑ ij − ϑ j , i ∈ I \ { } , j ∈ J and the ratios ( i ∈ I \ { } ): r i ( t, θ ) = exp (cid:16)(cid:80) j ∈ J ϑ ij X j ( t ) (cid:17)(cid:80) i (cid:48) ∈ I exp (cid:16)(cid:80) j ∈ J ϑ i (cid:48) j X j ( t ) (cid:17) = (cid:88) i (cid:48) ∈ I exp (cid:88) j ∈ J ( θ i (cid:48) j − θ ij ) X j ( t )  − . (4.2)The quasi-log-likelihood based on the observation on [0 , T ] for this ratio model is deﬁned at Equation(3.1). In the second step, we consider the ratios p k i i ( t, (cid:37) i ) = exp (cid:16)(cid:80) j ∈ J i (cid:37) i,k i j i Y ij ( t ) (cid:17)(cid:80) k (cid:48) i ∈ K i exp (cid:16)(cid:80) j ∈ J i (cid:37) i,k (cid:48) i j i Y ij ( t ) (cid:17) =  (cid:88) k (cid:48) i ∈ K i exp (cid:88) j ∈ J i ( (cid:37) i,k (cid:48) i j − (cid:37) i,k i j i ) X j ( t )  − , (4.3)and the associated quasi-log-likelihood of Equation (3.2). Consistency and asymptotic normalityof the quasi-maximum likelihood estimators are guaranteed by Theorem 3.1.13 .4 In-sample model selection with QAIC In this ﬁrst application, we perform in-sample model selection to assess the relevance of the diﬀerentpossible sets of covariates. For each stock and each trading day, we ﬁx a set of covariates. We usethe indices of the tested covariates to name the models: the model 146 is thus the model withcovariates ( Z , Z , Z ). If required, we estimate the parameters of all the Hawkes covariates andthen compute the Hawkes covariates using these values. We ﬁnally ﬁt three ratio models followingthe above procedure : one for the processes ( N , N ) (signature of the marker orders), one forthe processes ( N , , N , ) (aggressiveness of the bid market orders) and one for the processes( N , , N , ) (aggressiveness of the ask market orders).For each trading day, we then select the model minimizing the QAIC. For the ratio for the sidedetermination, the criterion is − H T (ˆ θ MT ) + 2 | J | , (4.4)where | J | is the cardinality of the set of J . For the aggressiveness ratios, the criterion is − H ( i ) T ( ˆ (cid:37) i ) + 2 | J i | ( i ∈ I ) . (4.5)We ﬁnally compute for each stock the frequencies of selection of diﬀerent sets of covariates (i.e. thenumber of trading days in which a model is selected by QAIC over the total number of trading daysin the sample for this stock). Figures 2, 3 and 4 plot the results as a model × stock heatmap foreach of these three ratios. For completeness, Tables 4, 5 and 6 in Appendix D list for each stockand each ratio model (side, bid aggressiveness, ask aggressiveness) the four most selected models(with frequency of selection).For side determination, the models 14689, 124689 and 1234689 are the three most often chosenmodels: the selected model is among these three models approximately 80% of the time in averageacross stocks. Imbalance, Hawkes covariates for bid and ask market orders, and Hawkes covariatesfor aggressive bid and ask market orders thus appear to be the most informative covariates.For aggressiveness determination, the model 146 is often selected. This is in line with intuition:imbalance is known to be a signiﬁcant proxy for price change (Lipton et al. , 2013) and Hawkescovariates for aggressive bid and aggressive are speciﬁc to the targeted events. Note also that forseveral stocks, models with “symmetric” sets of covariates can also be chosen: for ask aggressiveness,1679 is often selected, i.e. imbalance and all available ask Hawkes covariates ; symmetrically, 1458is selected for ask aggressiveness, i.e. imbalance and all available bid Hawkes covariates.One may in particular observe that these results conﬁrm the primary role of the spread measuredin ticks in the theory of ﬁnancial microstructure. Stocks for which the observed spread is mostlyequal to one tick are labeled ’large-tick stocks’, implying that market participants are constrainedby the price grid when submitting orders to the limit order book. Other stocks may be labeled’small-tick stocks’ (Eisler et al. , 2012). Using our sample, we compute the mean observed spread14igure 2: Side of market orders - Frequency of selection of each model by the QAIC criterion, foreach stock.Figure 3: Aggressiveness of bid market orders - Frequency of selection by the QAIC criterion ofeach model, for each stock. 15igure 4: Aggressiveness of ask market orders - Frequency of selection by the QAIC criterion ofeach model, for each stock.in ticks for each stock and each available trading day, and group these values in bins of equalsizes. Then inside each bin, we compute the frequency of selection of the covariate X (signedspread) by QAIC for the ratio estimation of Equation (4.2). Bar plot is provided in Figure 5 (left).We observe an increase of the frequency of the selection of the spread covariate when the meanobserved spread increases from 1 tick (its minimal possible value) to roughly 2 ticks. For largerspread values, frequency then oscillates at high values. A break point might be searched between1 . In this section, we use intensity and ratio models to predict the sign and aggressiveness of anincoming market order. For all tested models, the procedure is the following. On a given tradingday, the model is ﬁtted. Fitted parameters are then used on the following trading day (availablein the database) to compute the intensities (or ratios for ratio models), at all time. The typeof an incoming event is then predicted to be the type of highest intensity or ratio. The exerciseis theoretical in the sense that we assume that these computations are instantaneous, so thatintensities or ratios are available at all times.Recall the notation N = ( N i,k i ) i ∈{ , } ,k i ∈{ , } for the four-dimensional point process countingbid aggressive market orders, bid non-aggressive market orders, ask aggressive market orders andask non-aggressive market orders. We use two benchmark models.The ﬁrst benchmark model is the Hawkes model. Here, N is assumed to be a four-dimensionalHawkes process with a single exponential kernel. In vector notation, the intensity is written  λ , H ( t ) λ , H ( t ) λ , H ( t ) λ , H ( t )  =  µ , µ , µ , µ ,  + (cid:90) t  . . . ... . . .α ( i,k i ) , ( j,k j ) e − β ( i,ki ) , ( j,kj ) ( t − s ) . . . ... . . .  ·  dN , ( s ) dN , ( s ) dN , ( s ) dN , ( s )  Estimation and ratio computation can be found in e.g., Bowsher (2007); Muni Toke & Pomponio(2012). This model is labeled ’Hawkes’.The second benchmark model is the four-dimensional ratio model without marks (Muni Toke17ccuracy Hawkes Ratio-14689 MarkedRatio MarkedRatio4567-4567-4567 14689-146-146partial - side 0.781 0.808 0.776 partial - agg. 0.634 0.658 0.668 global 0.503 0.533 0.516

Table 2: Prediction performances of selected models averaged across stocks. Side accuracy givesthe fraction of correctly signed trades. Aggressiveness accuracy gives the proportion of trade with acorrectly predicted accuracy. Global accuracy gives the fraction of orders with correctly predictedside and aggressiveness.& Yoshida, 2020). In this model, the intensity of the counting process ( i, k i ) is λ i,k i R ( t ) = λ ,R ( t ) exp (cid:18) (cid:88) j ∈ J ϑ i,k i j X j ( t ) (cid:19) , with some unobserved baseline intensity λ ,R ( t ). Given the previous observations, we choose theset of covariates ( Z , Z , Z , Z , Z ) for this benchmark. It is natural to choose these covariates(imbalance, Hawkes for aggressive orders and Hawkes for all orders) given the results on modelselection of Section 4.4. Estimation and ratio computation are detailed in Muni Toke & Yoshida(2020). This model is labeled ’Ratio-14689’.These two benchmarks are used to assess the performances of two marked ratio models (ortwo-step ratio models) described in this paper. The ﬁrst marked ratio model uses the covariates( Z , Z , Z , Z ) for both steps. These covariates are based on the Hawkes processes of the bench-mark Hawkes model. The second marked ratio model uses the covariates ( Z , Z , Z , Z , Z ) forthe ﬁrst-step ratio (side determination) and ( Z , Z , Z ) for both second-step ratios (bid and askaggressiveness). Again, these choices are natural given the results on model selection of Section4.4. These models are labeled ’MarkedRatio-4567-4567-4567’ and ’MarkedRatio-14689-146-146’respectively.Figure 6 plots the results for each stock for the two benchmark models and the two markedratio models. For completeness, the partial performances for side determination and aggressive-ness determination of the trades are provided on Figure 7. Finally, Table 2 lists the partial andglobal prediction performances of these models averaged across stocks. The benchmark Hawkesmodel correctly predicts the sign and aggressiveness of an incoming with an accuracy in the range[40% , , Acknowledgements

This work was in part supported by Japan Science and Technology Agency CREST JPMJCR14D7;Japan Society for the Promotion of Science Grants-in-Aid for Scientiﬁc Research No. 17H01702(Scientiﬁc Research) and by a Cooperative Research Program of the Institute of Statistical Math-ematics.

References

Abergel, Fr´ed´eric, Anane, Marouane, Chakraborti, Anirban, Jedidi, Aymen, & Muni Toke, Ioane.2016.

Limit order books . Cambridge University Press.Bacry, Emmanuel, Dayri, Khalil, & Muzy, Jean-Fran¸cois. 2012. Non-parametric kernel estimationfor symmetric Hawkes processes. Application to high frequency ﬁnancial data.

The EuropeanPhysical Journal B-Condensed Matter and Complex Systems , (5), 1–12.Bacry, Emmanuel, Delattre, Sylvain, Hoﬀmann, Marc, & Muzy, Jean-Fran¸cois. 2013. Modellingmicrostructure noise with mutually exciting point processes. Quantitative ﬁnance , (1), 65–77.Biais, Bruno, Hillion, Pierre, & Spatt, Chester. 1995. An empirical analysis of the limit order bookand the order ﬂow in the Paris Bourse. the Journal of Finance , (5), 1655–1689.Bowsher, C. G. 2007. Modelling security market events in continuous time: Intensity based, mul-tivariate point process models. Journal of Econometrics , , 876–912.Chakraborti, Anirban, Muni Toke, Ioane, Patriarca, Marco, & Abergel, Fr´ed´eric. 2011. Econo-physics review: I. Empirical facts. Quantitative Finance , (7), 991–1012.Eisler, Zoltan, Bouchaud, Jean-Philippe, & Kockelkoren, Julien. 2012. The price impact of orderbook events: market orders, limit orders and cancellations. Quantitative Finance , (9), 1395–1419. 20arris, Lawrence, & Hasbrouck, Joel. 1996. Market vs. limit orders: the SuperDOT evidence onorder submission strategy. Journal of Financial and Quantitative analysis , (2), 213–231.Hautsch, Nikolaus. 2011. Modelling irregularly spaced ﬁnancial data: theory and practice of dynamicduration models . Vol. 539. Springer Science & Business Media.Ibragimov, I. A., & Has (cid:48) minski˘ı, R. Z. 1981.

Statistical estimation . Applications of Mathematics,vol. 16. New York: Springer-Verlag. Asymptotic theory, Translated from the Russian by SamuelKotz.Kutoyants, Yu A. 1984.

Parameter estimation for stochastic processes . Vol. 6. Heldermann.Kutoyants, Yu A. 2012.

Statistical inference for spatial Poisson processes . Vol. 134. SpringerScience & Business Media.Lallouache, Mehdi, & Challet, Damien. 2016. The limits of statistical signiﬁcance of Hawkesprocesses ﬁtted to ﬁnancial data.

Quantitative Finance , (1), 1–11.Large, Jeremy. 2007. Measuring the resiliency of an electronic limit order book. Journal of FinancialMarkets , (1), 1–25.Lipton, A., Pesavento, U., & Sotiropoulos, M. G. 2013. Trade arrival dynamics and quote imbalancein a limit order book. arxiv preprint arxiv:1312.0514 .Lu, Xiaofei, & Abergel, Fr´ed´eric. 2018. High-dimensional Hawkes processes for limit order books:modelling, empirical analysis and numerical calibration. Quantitative Finance , (2), 249–264.Morariu-Patrichi, Maxime, & Pakkanen, Mikko S. 2018. State-dependent Hawkes processes andtheir application to limit order book modelling. arXiv preprint arXiv:1809.08060 .Muni Toke, Ioane. 2016. Reconstruction of Order Flows using Aggregated Data. Market Mi-crostructure and Liquidity , (02), 1650007.Muni Toke, Ioane, & Pomponio, Fabrizio. 2012. Modelling trades-through in a limited order bookusing Hawkes processes. Economics -journal , , 22.Muni Toke, Ioane, & Yoshida, Nakahiro. 2017. Modelling intensities of order ﬂows in a limit orderbook. Quantitative Finance , (5), 683–701.Muni Toke, Ioane, & Yoshida, Nakahiro. 2020. Analyzing order ﬂows in limit order books withratios of Cox-type intensities. Quantitative Finance , 1–18.Rambaldi, Marcello, Bacry, Emmanuel, & Lillo, Fabrizio. 2017. The role of volume in order bookdynamics: a multivariate Hawkes process analysis.

Quantitative Finance , (7), 999–1020.21oshida, Nakahiro. 2011. Polynomial type large deviation inequalities and quasi-likelihood analysisfor stochastic diﬀerential equations. Annals of the Institute of Statistical Mathematics , (3),431–479. A Proof of Theorem 3.1

The convergence given in Theorem 3.1 can be obtained by the quasi-likelihood analysis, which werecall in Section B. We will apply Theorems B.2 and B.4 in Section B to the double ratio model.In the present situation, the scaling factor is b T = T , the joint parameter ( θ, ρ ) is for θ in SectionB, and the dimension of the full parameter space is ˇ p in place of p of Section B. Fix a set of valuesof parameters ( α, β , β , ρ, ρ , ρ ) so that Condition [ L

1] (Section B) is met with ρ = 2. A.1 Score functions and a central limit theorem

The score function for ρ i is given by F ( i ) T ( ρ i ) = ∂ ρ i H ( i ) T ( ρ i ) = (cid:88) k i ∈ K i (cid:90) T ∂ ρ i log q k i i ( t, ρ i ) dN i,k i t . Then F ( i ) T ( ρ i ) = (cid:88) k i ∈ K i (cid:90) T (cid:0) { k i } ( · ) − q (cid:91)i ( t, ρ i ) (cid:1) ⊗ Y i ( t ) dN i,k i t (A.1)where q (cid:91)i ( t, ρ i ) = ( q k i i ( t, ρ i )) k i ∈ K i, and Y i ( t ) = ( Y ij i ( t )) j i ∈ J i . By some calculus, we see F ( i ) T := F ( i ) T (( ρ i ) ∗ ) = (cid:88) k i ∈ K i (cid:90) T (cid:0) { k i } ( · ) − q (cid:91)i ( t, ( ρ i ) ∗ ) (cid:1) ⊗ Y i ( t ) d ˜ N i,k i t (A.2)We are assuming that the counting processes N i,k i ( i ∈ I ; j i ∈ K i ) have no common jumps. Thenthe p i × p i (cid:48) matrix valued process (cid:104) F ( i ) , F ( i (cid:48) ) (cid:105) T = 0 ( i, i (cid:48) ∈ I , i (cid:54) = i (cid:48) ) (A.3)and (cid:104) F ( i ) (cid:105) T = (cid:88) k ∈ K i (cid:90) T (cid:26) { k i } ( · ) − q (cid:91)i ( t, ( ρ i ) ∗ ) (cid:1) ⊗ Y i ( t ) (cid:27) ⊗ r i ( t, θ ∗ )Λ( λ ( t ) , X ( t )) q k i i ( t, ( ρ i ) ∗ ) dt = (cid:90) T V i ( Y i ( t ) , ( ρ i ) ∗ ) ⊗ ( Y i ( t )) ⊗ Λ( λ ( t ) , X ( t )) r i ( t, θ ∗ ) dt ( i ∈ I )22herefore, the mixing property [ M

2] gives the convergence T − (cid:104) F ( i ) (cid:105) T → p Γ ( i ) (( ρ i ) ∗ ) = E (cid:20) V i ( Y i (0) , ( ρ i ) ∗ ) ⊗ ( Y i (0)) ⊗ Λ( λ ( t ) , X (0)) r i (0 , θ ∗ ) (cid:21) (A.4)as T → ∞ , with the aid of [ M θ is the p -dimensional process F T ( θ ) = ∂ θ H T ( θ ) = (cid:88) i ∈ I (cid:90) T ∂ θ log r i ( t, θ ) dN it = (cid:88) i ∈ I (cid:90) T (cid:0) { i } ( · ) − r (cid:91) ( t, θ ) (cid:1) ⊗ X ( t ) dN it . (A.5)where r (cid:91) ( t, θ ) = ( r i ( t, θ )) i ∈ I . Evaluated at θ ∗ , F T = F T ( θ ∗ ) = (cid:88) i ∈ I (cid:90) T (cid:0) { i } ( · ) − r (cid:91) ( t, θ ∗ ) (cid:1) ⊗ X ( t ) d ˜ N it = (cid:88) i ∈ I (cid:88) k i ∈ K i (cid:90) T (cid:0) { i } ( · ) − r (cid:91) ( t, θ ∗ ) (cid:1) ⊗ X ( t ) d ˜ N i,k i t . (A.6)Then, the p × p matrix valued process (cid:104) F (cid:105) has the expression (cid:104) F (cid:105) T = (cid:88) i ∈ I (cid:88) k i ∈ K i (cid:90) T (cid:0) { i } ( · ) − r (cid:91) ( t, θ ∗ ) (cid:1) ⊗ ⊗ X ( t ) ⊗ r i ( t, θ ∗ )Λ( λ ( t ) , X ( t )) q k i i ( t, ( ρ i ) ∗ ) dt = (cid:88) i ∈ I (cid:90) T (cid:0) { i } ( · ) − r (cid:91) ( t, θ ∗ ) (cid:1) ⊗ ⊗ X ( t ) ⊗ r i ( t, θ ∗ )Λ( λ ( t ) , X ( t )) dt = (cid:90) T V ( X ( t )) ⊗ X ( t ) ⊗ Λ( λ ( t ) , X ( t )) dt. Then the mixing property [ M

2] provides the convergence T − (cid:104) F (cid:105) T → p Γ( θ ∗ ) = E (cid:20)(cid:18) V ( X (0)) ⊗ X (0) ⊗ (cid:19) Λ( λ (0) , X (0)) (cid:21) (A.7)as T → ∞ .For i ∈ I , (cid:104) F, F ( i ) (cid:105) T = (cid:88) k i ∈ K i (cid:90) T (cid:0) { i } ( · ) − r (cid:91) ( t, θ ∗ ) (cid:1) ⊗ (cid:0) { k i } ( · ) − q (cid:91)i ( t, ( ρ i ) ∗ ) (cid:1) ⊗ X ( t ) ⊗ Y i ( t ) × r i ( t, θ ∗ )Λ( λ ( t ) , X ( t )) q k i i ( t, ( ρ i ) ∗ ) dt = 0 (A.8)23ince (cid:88) k i ∈ K i (cid:0) { k i } ( · ) − q (cid:91)i ( t, ( ρ i ) ∗ ) (cid:1) q k i i ( t, ( ρ i ) ∗ ) = 0 . The full information matrix is the ˇ p × ˇ p block diagonal matrixΓ = Γ( θ ∗ , ρ ∗ ) = diag (cid:2) Γ( θ ∗ ) , Γ (( ρ ) ∗ ) , Γ (( ρ ) ∗ ) , ..., Γ ¯ i (( ρ ¯ i ) ∗ ) (cid:3) Let ∆ T = T − / (cid:0) F T , ( F ( i ) T ) i ∈ I (cid:1) . Now, by the martingale central limit theorem, it is easy to obtainthe convergence ∆ T → d Γ / ζ ( T → ∞ )where ζ is a ˇ p -dimensional standard Gaussian random vector. The joint convergence (∆ T , Γ) → d (Γ / ζ, Γ) is obvious since Γ is deterministic.

A.2 Condition [ L According to (B.2), we deﬁne the random ﬁeld Y T : Ω × Θ × R → R by Y T ( θ, ρ ) = T − (cid:0) H T ( θ, ρ ) − H T ( θ ∗ , ρ ∗ ) (cid:1) for H T ( θ, ρ ) given in (3.3). From the expression (3.4) of H T ( θ, ρ ), we have T − H T ( θ, ρ ) = T − (cid:88) i ∈ I (cid:88) k i ∈ K i (cid:90) T log (cid:0) r i ( t, θ ) q k i i ( t, ρ i ) (cid:1) dN i,k i t = T − (cid:88) i ∈ I (cid:88) k i ∈ K i (cid:90) T log (cid:0) r i ( t, θ ) q k i i ( t, ρ i ) (cid:1) d ˜ N i,k i t + T − (cid:88) i ∈ I (cid:88) k i ∈ K i (cid:90) T (cid:8) log (cid:0) r i ( t, θ ) q k i i ( t, ρ i ) (cid:9) λ ( t ) exp (cid:18) (cid:88) j ∈ J ( ϑ ∗ ) ij X j ( t ) (cid:19) p k i i ( t, ( (cid:37) ∗ ) i,k i ) dt. By deﬁnition, (cid:12)(cid:12) ∂ (cid:96) ( θ,ρ ) log (cid:0) r i ( t, θ ) q k i i ( t, ρ i ) (cid:1)(cid:12)(cid:12) ≤ C (cid:18) (cid:88) j ∈ J | X j ( t ) | + (cid:88) i ∈ I (cid:88) j i ∈ J i | Y k i j i ( t ) | (cid:19) ( (cid:96) = 0 , C is a constant depending on the diameters of Θ and R . Therefore, under Condition [ M E (cid:20)(cid:12)(cid:12)(cid:12)(cid:12) ∂ (cid:96) ( θ,ρ ) T − / (cid:90) T log (cid:0) r i ( t, θ ) q k i i ( t, ρ i ) (cid:1) d ˜ N i,k i t (cid:12)(cid:12)(cid:12)(cid:12) k (cid:21) ∼ E (cid:20)(cid:18) T − (cid:90) T (cid:12)(cid:12) ∂ (cid:96) ( θ,ρ ) log (cid:0) r i ( t, θ ) q k i i ( t, ρ i ) (cid:1)(cid:12)(cid:12) dN i,k i t (cid:19) ( k − (cid:21) < ∼ E (cid:20) T − (cid:90) T (cid:12)(cid:12) ∂ (cid:96) ( θ,ρ ) log (cid:0) r i ( t, θ ) q k i i ( t, ρ i ) (cid:1)(cid:12)(cid:12) k λ i,k i ( t, ( ϑ i ) ∗ , ( (cid:37) i,k i ) ∗ ) dt (cid:21) + T − k − E (cid:20)(cid:18) T − / (cid:90) T (cid:12)(cid:12) ∂ (cid:96) ( θ,ρ ) log (cid:0) r i ( t, θ ) q k i i ( t, ρ i ) (cid:1)(cid:12)(cid:12) d ˜ N i,k i t (cid:19) ( k − (cid:21) = O (1) + T − k − E (cid:20)(cid:18) T − / (cid:90) T (cid:12)(cid:12) ∂ (cid:96) ( θ,ρ ) log (cid:0) r i ( t, θ ) q k i i ( t, ρ i ) (cid:1)(cid:12)(cid:12) d ˜ N i,k i t (cid:19) ( k − (cid:21) for k ∈ N , where the constant appearing at each < ∼ depends only on ˇ p , k and the constant of theBurkholder-Davis-Gundy inequality. By induction, we obtainsup ( θ,ρ ) ∈ Θ ×R sup T ≥ (cid:13)(cid:13)(cid:13)(cid:13) ∂ (cid:96) ( θ,ρ ) T − / (cid:90) T log (cid:0) r i ( t, θ ) q k i i ( t, ρ i ) (cid:1) d ˜ N i,k i t (cid:13)(cid:13)(cid:13)(cid:13) p < ∞ (A.9)for every p > (cid:96) ∈ { , } . Then Sobolev’s inequality givessup T ≥ (cid:13)(cid:13)(cid:13)(cid:13) sup ( θ,ρ ) ∈ Θ ×R (cid:12)(cid:12)(cid:12)(cid:12) T − / (cid:90) T log (cid:0) r i ( t, θ ) q k i i ( t, ρ i ) (cid:1) d ˜ N i,k i t (cid:12)(cid:12)(cid:12)(cid:12) (cid:13)(cid:13)(cid:13)(cid:13) p < ∞ (A.10)for every p > t, θ, ρ ) = (cid:88) i ∈ I (cid:88) k i ∈ K i (cid:26) r i ( t, θ ∗ ) p k i i ( t, ( (cid:37) ∗ ) i,k i ) log r i ( t, θ ) q i ( t, k i , ρ i ) r i ( t, θ ∗ ) q k i i ( t, ( ρ ∗ ) i,k i ) (cid:27) × λ ( t ) (cid:88) i (cid:48) ∈ I exp (cid:18) (cid:88) j ∈ J ( ϑ ∗ ) i (cid:48) j X j ( t ) (cid:19) . Then Conditions [ M

1] and [ M

2] implysup ( θ,ρ ) ∈ Θ ×R sup T ≥ (cid:13)(cid:13)(cid:13)(cid:13) T − / (cid:90) T ∂ (cid:96) ( θ,ρ ) (cid:0) Φ( t, θ, ρ ) − E [Φ( t, θ, ρ )] (cid:1) dt (cid:13)(cid:13)(cid:13)(cid:13) p < ∞ for every p > (cid:96) ∈ { , } . This entailssup T ≥ (cid:13)(cid:13)(cid:13)(cid:13) T / sup ( θ,ρ ) ∈ Θ ×R (cid:12)(cid:12)(cid:12)(cid:12) T − (cid:90) T Φ( t, θ, ρ ) dt − E [Φ( t, θ, ρ )] (cid:12)(cid:12)(cid:12)(cid:12)(cid:13)(cid:13)(cid:13)(cid:13) p < ∞ (A.11)for every p >

1. 25ombining (A.11) with (A.10), we obtainsup T ≥ E (cid:20)(cid:18) T / sup ( θ,ρ ) ∈ Θ ×R (cid:12)(cid:12) Y T ( θ, ρ ) − Y ( θ, ρ ) (cid:12)(cid:12)(cid:19) p (cid:21) < ∞ (A.12)for every p >

1, if we set Y ( θ, ρ ) = E (cid:20) (cid:88) i ∈ I (cid:88) k i ∈ K i (cid:26) r i (0 , θ ∗ ) p k i i (0 , ( (cid:37) ∗ ) i,k i ) log r i (0 , θ ) q i (0 , k i , ρ i ) r i (0 , θ ∗ ) q k i i (0 , ( ρ ∗ ) i,k i ) (cid:27) × λ (0) (cid:88) i (cid:48) ∈ I exp (cid:18) (cid:88) j ∈ J ( ϑ ∗ ) i (cid:48) j X j (0) (cid:19)(cid:21) . This veriﬁes Condition [ L T ( θ, ρ ) byΓ T ( θ, ρ ) = − T − ∂ θ,ρ ) H T ( θ, ρ ) . From (A.1), ∂ ρ i H ( i ) T ( ρ i ) = − (cid:88) k i ∈ K i (cid:90) T ∂ ρ i q (cid:91)i ( t, ρ i ) ⊗ Y i ( t ) dN i,k i t . More precisely, ∂ ρ i,kiji ∂ ρ i,k (cid:48) ij (cid:48) i H ( i ) T ( ρ i ) = − (cid:88) k (cid:48)(cid:48) i ∈ K i (cid:90) T (cid:26) { k i = k (cid:48) i } q k i i ( t, ρ i ) − q k i i ( t, ρ i ) q k (cid:48) i i ( t, ρ i ) (cid:27) Y ij i ( t ) Y ij (cid:48) i ( t ) dN i,k (cid:48)(cid:48) i t = − (cid:88) k (cid:48)(cid:48) i ∈ K i (cid:90) T (cid:26) { k i = k (cid:48) i } q k i i ( t, ρ i ) − q k i i ( t, ρ i ) q k (cid:48) i i ( t, ρ i ) (cid:27) Y ij i ( t ) Y ij (cid:48) i ( t ) d ˜ N i,k (cid:48)(cid:48) i t − (cid:90) T (cid:26) { k i = k (cid:48) i } q k i i ( t, ρ i ) − q k i i ( t, ρ i ) q k (cid:48) i i ( t, ρ i ) (cid:27) Y ij i ( t ) Y ij (cid:48) i ( t ) × r i ( t, θ ∗ )Λ( λ ( t ) , X ( t )) dt = − (cid:88) k (cid:48)(cid:48) i ∈ K i (cid:90) T V i ( Y i ( t ) , ρ i ) k i ,k (cid:48) i Y ij i ( t ) Y ij (cid:48) i ( t ) d ˜ N i,k (cid:48)(cid:48) i t − (cid:90) T V i ( Y i ( t ) , ρ i ) k i ,k (cid:48) i Y ij i ( t ) Y ij (cid:48) i ( t )Λ( λ ( t ) , X ( t )) r i ( t, θ ∗ ) dt for k i , k (cid:48) i ∈ K i, , j i , j (cid:48) i ∈ J i and i ∈ I , where (3.12) was used. Similarly, from (A.5), ∂ θ H T ( θ ) = − (cid:88) i ∈ I (cid:90) T ∂ θ r (cid:91) ( t, θ ) ⊗ X ( t ) dN it , ∂ θ ij ∂ θ i (cid:48) j (cid:48) H T ( θ ) = − (cid:88) i (cid:48)(cid:48) ∈ I (cid:90) T V ( X ( t ) , θ ) i,i (cid:48) X j ( t ) X j (cid:48) ( t ) d ˜ N i (cid:48)(cid:48) t − (cid:90) T V ( X ( t ) , θ ) i,i (cid:48) X j ( t ) X j (cid:48) ( t )Λ( λ ( t ) , X ( t )) dt for i, i (cid:48) ∈ I and j, j (cid:48) ∈ J . Obviously, ∂ θ ∂ ρ i H ( i ) T ( ρ i ) = 0 and ∂ ρ i (cid:48) ∂ ρ i H ( i ) T ( ρ i ) = 0 ( i (cid:48) , i ∈ I : i (cid:48) (cid:54) = i )In a way similar to the derivation of (A.12), as a matter of fact it is easier, we can showsup T ≥ E (cid:2)(cid:0) T / | Γ T ( θ ∗ , ρ ∗ ) − Γ | (cid:1) p (cid:3) < ∞ for every p > M

1] and [ M L β = 1 / L L L

4] has beenveriﬁed.

A.3 Conditions [ L and [ L We see ∂ θ,ρ ) Y ( θ, ρ ) = Γ( θ, ρ ) , and by [ M Y ( θ, ρ ) is strictly convex function on Θ × R = Θ × Π i ∈ I R i . For someneighborhood U of ( θ ∗ , ρ ∗ ) and some positive number χ , Y ( θ, ρ ) ≤ − χ | ( θ, ρ ) − ( θ ∗ , ρ ∗ ) | (cid:0) ( θ, ρ ) ∈ U (cid:1) by the non-degeneracy of Γ( θ ∗ , ρ ∗ ). Moreover, sup ( θ,ρ ) ∈ (Θ ×R ) \ U Y ( θ, ρ ) <

0. In fact, if there wasa point ( θ + , ρ + ) (cid:54)∈ U such that Y ( θ + , ρ + ) = 0, then at a point on the segment connecting ( θ ∗ , ρ ∗ )and ( θ + , ρ + ), Γ( θ, ρ ) would degenerate, and this contradicts [ M L

2] is veriﬁed for ρ = 2 and some (deterministic) positive number χ since the parameter space isbounded. Condition [ L

3] is now obvious. 27 .4 Proof of Theorem 3.1

We have veriﬁed Conditions [ L L

4] in the present situation. Theorem 3.1 now follows fromTheorems B.2 and B.4.

B Quasi-likelihood analysis

This section recalls the quasi-likelihood analysis. Let Θ be a bounded open set in R p . Given aprobability space (Ω , F , P ), suppose that H T : Ω × Θ → R is of class C , that is, the mappingΘ (cid:51) θ (cid:55)→ H T ( ω, θ ) ∈ R p is continuously extended to Θ and of class C for every ω ∈ Ω, and themapping Ω (cid:51) ω (cid:55)→ H T ( ω, θ ) ∈ R p is measurable for every θ ∈ Θ. Let Γ be a p × p random matrix.Let θ ∗ ∈ Θ. For a sequence a T ∈ GL ( p ) satisfying lim T →∞ | a T | = 0, let∆ T [ u ] = ∂ θ H T ( θ ∗ ) a T and Γ T ( θ ) = − a (cid:63)T ∂ θ H T ( θ ) a T (B.1)where (cid:63) denotes the matrix transpose. We consider a random ﬁeld Y T ( θ ) = b − T (cid:0) H T ( θ ) − H T ( θ ∗ ) (cid:1) , (B.2)which will be assumed to converge to a random ﬁeld Y : Ω × Θ → R . Only for simplifyingpresentation, we will assume that a T = b − / T I p for diverging sequence ( b T ) T > of positive numbers,where I p is the identity matrix. In what follows, we ﬁx a positive number L .We will give a simpliﬁed exposition of Yoshida (2011) on the polynomial type large deviationinequality. Let α , β , β , ρ , ρ and ρ be numbers. [ L ] The numbers α , β , β , ρ , ρ and ρ satisfy the following inequalities:0 < α < , < β < / , < ρ < min { , α (1 − α ) − , β (1 − α ) − } ,αρ < ρ , β ≥ − β − ρ > . Let β = α (1 − α ) − . [ L ] There is a positive random variable χ such that Y ( θ ) = Y ( θ ) − Y ( θ ∗ ) ≤ − χ | θ − θ ∗ | ρ for all θ ∈ Θ. 28 L ] There exists a C L such that P (cid:2) χ ≤ r − ( ρ − αρ ) (cid:3) ≤ C L r L ( r > P (cid:2) λ min (Γ) < r − ρ (cid:3) ≤ C L r L ( r > . [ L ] (i) For M = L (1 − ρ ) − , sup T > E (cid:2) | ∆ T | M (cid:3) < ∞ . (ii) For M = L (1 − β − ρ ) − ,sup T > E (cid:20)(cid:18) sup h : | h |≥ b − α/ T b − β T (cid:12)(cid:12) Y T ( θ ∗ + h ) − Y ( θ ∗ + h ) (cid:12)(cid:12)(cid:19) M (cid:21) < ∞ . (iii) For M = L ( β − ρ ) − ,sup T > E (cid:20)(cid:18) b − T sup θ ∈ Θ (cid:12)(cid:12) ∂ θ H T ( θ ) (cid:12)(cid:12)(cid:19) M (cid:21) < ∞ . (iv) For M = L (cid:0) β (1 − α ) − − ρ (cid:1) − ,sup T > E (cid:20)(cid:18) b β T (cid:12)(cid:12) Γ T ( θ ∗ ) − Γ (cid:12)(cid:12)(cid:19) M (cid:21) < ∞ . Let U T = { u ∈ R p ; θ ∗ + a T u ∈ Θ } and V T ( r ) = { u ∈ U T ; | u | ≥ r } for r > Theorem B.1. (Yoshida (2011))

Suppose that Conditions [ L - [ L are satisﬁed. Then there existsa constant C such that P (cid:20) sup u ∈ V T ( r ) Z T ( u ) ≥ exp (cid:0) − − r − ( ρ ∨ ρ ) (cid:1)(cid:21) ≤ Cr L for all T > and r > . Here the supremum of the empty set should read −∞ by convention. We comments some points. Parameters satisfying [ L

1] exist. Nondegeneracy conditions in[ A

3] are obvious in ergodic cases. In this paper, we will apply Theorem B.1 under ergodicity ofthe stochastic system. Theorem B.1 asserts a polynomial type large deviation inequality can beobtained once the boundedness of moments of some random variables is veriﬁed. Condition [ L

4] iseasy to obtain because each variable is usually a simple additive functional. The polynomial typelarge deviation inequality in Theorem B.1 enables us to easily apply the scheme by Ibragimov &Has (cid:48) minski˘ı (1981) and Kutoyants (1984, 2012) to various dependence structures.29et u ∈ R p . Deﬁne r T ( u ) ( u ∈ U T ) by Z T ( u ) = exp (cid:18) ∆ T [ u ] −

12 Γ[ u ⊗ ] + r T ( u ) (cid:19) ( u ∈ U T ) (B.3)It is said that Z T is locally asymptotically quadratic (LAQ) at θ ∗ if r T ( u ) → p T → ∞ for every u ∈ R p , and hence log Z T ( u ) is asymptotically approximated by a random quadratic function of u .We will conﬁne our attention to a very standard case where Z T is locally asymptotically mixednormal, though the general theory of the quasi-likelihood analysis is framed more generally.Any measurable mapping ˆ θ MT : Ω → Θ is called a quasi-maximum likelihood estimator (QMLE)for H T if H T (ˆ θ MT ) = max θ ∈ Θ H T ( θ ) . When H T is continuous on the compact Θ, such a measurable function always exists, which isensured by the measurable selection theorem. Let ˆ u MT = a − T (ˆ θ MT − θ ∗ ) for the QMLE ˆ θ MT . Theorem B.2.

Let

L > p > . Suppose that Conditions [ L - [ L are satisﬁed and that (∆ T , Γ) → d (Γ / ζ, Γ) as T → ∞ , where ζ is a p -dimensional standard Gaussian random vector independent of Γ . Then E (cid:2) f (ˆ u MT ) (cid:3) → E (cid:2) f (ˆ u ) (cid:3) ( T → ∞ ) for ˆ u = Γ − / ζ and for any f ∈ C ( R p ) satisfying lim | u |→∞ | u | − p | f ( u ) | < ∞ .Proof. We will sketch the proof to convey the concepts of the quasi-likelihood analysis to thereader. See Yoshida (2011) for details. The space ˆ C ( R p ) is the linear space of all continuousfunctions f : R p → R satisfying lim | u |→∞ f ( u ) = 0. The space ˆ C ( R p ) becomes a separable Banachspace equipped with the supremum norm (cid:107) f (cid:107) ∞ = sup u ∈ R p | f ( u ) | . Moreover, ˆ C ( R p ) is regarded asa measurable space with the Borel σ -ﬁeld. Let Z ( u ) = exp (cid:18) Γ / ζ [ u ] −

12 Γ[ u ⊗ ] (cid:19) (B.4)for u ∈ R p .The term r T ( u ) admits the expression r T ( u ) = (cid:90) (1 − s ) (cid:8) Γ[ u ⊗ ] − Γ T ( θ ∗ + sa T u )[ u ⊗ ] (cid:9) ds (B.5)for u such that | u | ≤ b (1 − α ) / T and T such that B ( θ ∗ , b − α/ T ) ⊂ Θ. In this situation, we can applyTaylor’s formula even though the whole Θ is not convex. Condition [ L

4] (iii) and the convergence30f ∆ T ensures tightness of the random ﬁelds (cid:8) Z T | B (0 ,R ) (cid:9) T >T for every R >

0, where B (0 , R ) = { u ∈ R p } and T is a suﬃciently large number depending on R . Combining this property withthe polynomial type large deviation inequality given by Theorem B.1, we obtain the convergence Z T → Z in ˆ C ( R p ) for the random ﬁeld Z T extended as an element of ˆ C ( R p ) so that sup R p \ U T Z T ( u ) ≤ sup u ∈ ∂ U T Z T ( u ). Consequently, ˆ u T → ˆ u = argmax u ∈ R p Z ( u ). It is known that a measurable versionof extension of Z T exists.A polynomial type large deviation, even weaker than the one in Theorem B.1, serves to show L q -boundedness of {| ˆ u T | q } for L > q > p . Then the family { ˆ u T } is uniformly integrable, and hencewe obtain the convergence of E [ f (ˆ u T )]. Remark 2.

In Theorem B.2, if ∆ T → d Γ / ζ F -stably, then (∆ T , Γ) → d (Γ / ζ, Γ) and ˆ u MT → ˆ u F -stably.An advantage of the quasi-likelihood analysis is that the asymptotic behavior of the quasi-Bayesian estimator can be obtained as well as that of the quasi-maximum likelihood estimator andits moments convergence. The mappingˆ θ BT = (cid:20) (cid:90) Θ exp (cid:0) H T ( θ ) (cid:1) (cid:36) ( θ ) dθ (cid:21) − (cid:90) Θ θ exp (cid:0) H T ( θ ) (cid:1) (cid:36) ( θ ) dθ is called a quasi-Bayesian estimator (QBE) with respect to the prior density (cid:36) . The QBE ˆ θ BT takes values in the convex-hull of Θ. We will assume (cid:36) is continuous and 0 < inf θ ∈ Θ (cid:36) ( θ ) ≤ sup θ ∈ Θ (cid:36) ( θ ) < ∞ . We will give a concise exposition in the following among many possible ways.The reader is referred to Yoshida (2011) for further information. Recall that p is the dimensionof Θ, and B ( R ) denotes the open ball of radius R centered at the origin. C ( B ( R )) is the spaceof all continuous functions on B ( R ), and it is equipped with the supremum norm. Let V T ( r ) = { u ∈ U T ; | u | ≥ r } . As before, ˆ u = Γ − / ζ with a p -dimensional standard Gaussian random vector ζ independent of Γ. Write ˆ u BT = a − T (ˆ θ BT − θ ∗ ). Theorem B.3.

Let p ≥ , L > p + 1 , D > p + p . Suppose that (∆ T , Γ) → d (Γ / ζ, Γ) as T → ∞ ,where ζ is a p -dimensional standard Gaussian random vector independent of Γ . Moreover, supposethe following conditions are satisﬁed. (i) For every

R > , Z T | B ( R ) → d Z | B ( R ) in C ( B ( R )) (B.6) as T → ∞ , where Z is given in (B.4). ii) There exist positive constants T , C and C such that P (cid:20) sup V T ( r ) Z T ≥ C r − D (cid:21) ≤ C r − L (B.7) for all T ≥ T and r > . (iii) For some T > , sup T ≥ T E (cid:20)(cid:18) (cid:90) U T Z T ( u ) (cid:19) − (cid:21) < ∞ . (B.8) Then E (cid:2) f (ˆ u BT ) (cid:3) → E (cid:2) f (ˆ u ) (cid:3) (B.9) as T → ∞ for any continuous function f : R s f p → R satisfying sup u ∈ R p (cid:8) (1 + | u | ) − p | f ( u ) | (cid:9) < ∞ .Proof. We will give a brief summary of the proof; see Yoshida (2011) for details. The variable ˆ u BT has the expressionˆ u BT = (cid:20) (cid:90) U T Z T ( u ) (cid:36) ( θ ∗ + a T u ) du (cid:21) − (cid:90) U T u Z T ( u ) (cid:36) ( θ ∗ + a T u ) du By (B.7) and the properties of (cid:36) , we can approximate ˆ u BT by˜ u T = (cid:20) (cid:90) B ( R ) Z T ( u ) du (cid:21) − (cid:90) B ( R ) u Z T ( u ) du for paying small error when R is large. By (B.6),˜ u T → d (cid:20) (cid:90) B ( R ) Z ( u ) du (cid:21) − (cid:90) B ( R ) u Z ( u ) du =: ˆ u ( R ) . The random ﬁeld Z inherits a tail estimate from (B.7), and hence ˆ u ( R ) is approximated by (cid:20) (cid:90) R p Z ( u ) du (cid:21) − (cid:90) R p u Z ( u ) du = Γ − / ζ = ˆ u. Combining these estimates, we can conclude ˆ u BT → d ˆ u as T → ∞ . Convergence of the expectationis a consequence of uniform integrability of | ˆ u BT | p ensured by (B.7). Remark 3. (a) It is possible to relax the conditions of Theorem B.3 to only ensure the convergenceˆ u BT → ˆ u . (b) In Theorem B.3, if (∆ T , Γ) → d (Γ / ζ, Γ) and ˆ u BT → ˆ u F -stably. (c) Usually, the32ondition (iii) of Theorem B.3 is easily veriﬁed; See Lemma 2 of Yoshida (2011).The following result follows from Theorem B.3. Theorem B.4.

Let p > p and L > max (cid:26) p + 1 , p ( β − ρ ) , p (2 β (1 − α ) − − ρ ) (cid:27) . Suppose that Conditions [ L - [ L are satisﬁed and that E [ | Γ | p ] < ∞ . (∆ T , Γ) → d (Γ / ζ, Γ) as T → ∞ , where ζ is a p -dimensional standard Gaussian random vector independent of Γ . Then E (cid:2) f (ˆ u BT ) (cid:3) → E (cid:2) f (ˆ u ) (cid:3) ( T → ∞ ) for ˆ u = Γ − / ζ and for any f ∈ C ( R p ) satisfying lim | u |→∞ | u | − p | f ( u ) | < ∞ .Proof. The convergence (B.6) holds, as shown in the proof of Theorem B.2. The polynomial typelarge deviation inequality (B.7) is a consequence of Theorem B.1; the number D is arbitrary. Fix δ >

0. Then there exists T > B ( δ ) ⊂ Θ. In particular, r T ( u ) admits the representation(B.5) for all u ∈ B ( δ ). Since M = L ( β − ρ ) − > p , M = L (2 β (1 − α ) − − ρ ) − > p and p > p ,we have p (cid:48) := min { M , M , p } > p and E [ | r T ( u ) | p (cid:48) ] ≤ C | u | p (cid:48) ( u ∈ B ( δ ))for some constant C . Then Lemma 2 of Yoshida (2011) gives the estimate E (cid:20)(cid:18) (cid:90) B ( δ ) Z T ( u ) du (cid:19) − (cid:21) ≤ C by a constant C depending on ( p (cid:48) , p , δ, C ) and the supremums appearing in [ L C is independent of T ≥ T . Therefore (B.8) holds true. Thus, we can apply Theorem B.3 toconclude the proof. C List of stocks

Table 3 lists all the stocks investigated in the paper. For each stock, the total number of daysavailable in the sample is given. Note that for lack of usage time allotment on the computationalresources used for this paper, some trading days for few very liquid stocks were not used for some ofthe marked ratio models tested in Section 4.4. In this case, only the trading days where all modelshave been computed have been used. This is the last column of the table.33

IC Company Sector Number of trading Number of tradingdays in sample days used in QAICAIRP.PA Air Liquide Healthcare / Energy 238 238BNPP.PA BNP Paribas Banking 224 62EDF.PA Electricite de France Energy 236 236LAGA.PA Lagard`ere Media 142 142CARR.PA Carrefour Retail 229 229BOUY.PA Bouygues Construction / Telecom 228 228ALSO.PA Alstom Transport 229 229ACCP.PA Accor Hotels 227 227ALUA.PA Alcatel Networks / Telecom 234 234AXAF.PA Axa Insurance 236 131CAGR.PA Cr´edit Agricole Banking 235 235CAPP.PA Cap Gemini Technology Consulting 232 232DANO.PA Danone Food 229 229ESSI.PA Essilor Optics 228 228LOIM.PA Klepierre Finance 221 221LVMH.PA Louis Vuitton Mo¨et Hennessy Luxury 233 198MICP.PA Michelin Tires 229 229OREP.PA L’Or´eal Cosmetics 233 233PERP.PA Pernod Ricard Spirits 224 224PEUP.PA Peugeot Automotive 151 151PRTP.PA Kering Luxury 227 227PUBP.PA Publicis Communication 223 223RENA.PA Renault Automotive 228 172SAF.PA Safran Aerospace / Defense 232 232TECF.PA Technip Energy 225 225TOTF.PA Total Energy 232 75VIE.PA Veolia Energy / Environment 234 234VIV.PA Vivendi Media 234 234VLLP.PA Vallourec Materials 228 228VLOF.PA Valeo Automotive 221 212SASY.PA Sanoﬁ Healthcare 229 97SCHN.PA Schneider Electric Energy 224 164SGEF.PA Vinci Construction 229 229SGOB.PA Saint Gobain Materials 234 180SOGN.PA Soci´et´e G´en´erale Banking 229 103STM.PA ST Microelectronics Semiconductor 227 227

Table 3: List of stocks investigated in this paper. Sample consists of the whole year 2015, repre-senting roughly 230 trading days for all stocks except LAGA.PA and PEUP.PA which are missingroughly 70 trading days. 34

In-sample AIC selection - Detailed results Table 4: Side determination - AIC most selected models by stock (covariates on the ﬁrst line,frequency on the second line) 36

Table 5: Bid aggressiveness determination - AIC most selected models by stock (covariates on theﬁrst line, frequency on the second line) 371 2 3ACCP.PA 146 14689 189 14567890.19 0.16 0.10 0.09AIRP.PA 1679 146 14567 146890.31 0.14 0.14 0.12ALSO.PA 146 189 14567 12460.21 0.17 0.12 0.08ALUA.PA 146 189 14689 145670.25 0.14 0.12 0.10AXAF.PA 1679 14689 14567 1460.27 0.26 0.14 0.05BNPP.PA 14689 1679 146 145670.31 0.15 0.08 0.08BOUY.PA 146 14689 14567 1890.23 0.16 0.12 0.09CAGR.PA 1679 146 14567 146890.31 0.17 0.13 0.12CAPP.PA 146 14689 189 145670.23 0.18 0.13 0.10CARR.PA 14689 146 1679 145670.20 0.20 0.19 0.09DANO.PA 14689 1679 146 145670.21 0.19 0.17 0.10EDF.PA 146 14689 1679 12460.21 0.15 0.14 0.10ESSI.PA 1679 14567 146 146890.21 0.17 0.15 0.11LAGA.PA 146 189 1289 145670.28 0.16 0.08 0.08LOIM.PA 189 146 14689 145670.24 0.18 0.11 0.09LVMH.PA 1679 14567 14689 1460.29 0.22 0.16 0.08MICP.PA 146 14689 14567 12460.23 0.16 0.11 0.08OREP.PA 1679 14567 14689 1890.21 0.19 0.17 0.12PERP.PA 1679 146 14689 145670.19 0.17 0.14 0.13PEUP.PA 146 1679 14689 1890.26 0.19 0.13 0.07PRTP.PA 146 1679 14689 145670.36 0.10 0.10 0.09PUBP.PA 146 1679 14689 1890.18 0.15 0.11 0.10RENA.PA 146 14689 189 14567890.16 0.16 0.15 0.08SAF.PA 146 14689 14567 12460.23 0.19 0.12 0.09SASY.PA 1679 146 14567 146890.16 0.14 0.13 0.13SCHN.PA 14689 146 1679 145670.23 0.16 0.15 0.12SGEF.PA 146 1679 14689 145670.16 0.15 0.15 0.14SGOB.PA 14689 146 189 12460.18 0.18 0.10 0.08SOGN.PA 14689 146 189 12460.24 0.19 0.14 0.09STM.PA 189 146 1679 146890.25 0.22 0.11 0.08TECF.PA 146 14689 14567 14567890.21 0.20 0.15 0.09TOTF.PA 14689 14567 146 14580.21 0.16 0.12 0.08VIE.PA 146 1679 14567 146890.21 0.15 0.14 0.13VIV.PA 1679 14689 14567 1246890.27 0.25 0.14 0.07VLLP.PA 146 189 14689 14580.24 0.21 0.12 0.10VLOF.PA 146 1679 189 145670.19 0.14 0.12 0.12