[PDF] Analyzing order flows in limit order books with ratios of Cox-type intensities

Abstract

We introduce a Cox-type model for relative intensities of orders flows in a limit order book. The model assumes that all intensities share a common baseline intensity, which may for example represent the global market activity. Parameters can be estimated by quasi likelihood maximization, without any interference from the baseline intensity. Consistency and asymptotic behavior of the estimators are given in several frameworks, and model selection is discussed with information criteria and penalization. The model is well-suited for high-frequency financial data: fitted models using easily interpretable covariates show an excellent agreement with empirical data. Extensive investigation on tick data consequently helps identifying trading signals and important factors determining the limit order book dynamics. We also illustrate the potential use of the framework for out-of-sample predictions.

Full PDF

AAnalyzing order ﬂows in limit order bookswith ratios of Cox-type intensities

Ioane Muni Toke ∗ and Nakahiro Yoshida † Math´ematiques et Informatique pour la Complexit´e et les Syst`emes, CentraleSup´elec,Universit´e Paris-Saclay, France Graduate School of Mathematical Sciences, University of Tokyo, Japan.

CREST, Japan Science and Technology Agency, Japan.August 23, 2019

Abstract

We introduce a Cox-type model for relative intensities of orders ﬂows in a limit order book.The model assumes that all intensities share a common baseline intensity, which may for ex-ample represent the global market activity. Parameters can be estimated by quasi-likelihoodmaximization, without any interference from the baseline intensity. Consistency and asymptoticbehavior of the estimators are given in several frameworks, and model selection is discussed withinformation criteria and penalization. The model is well-suited for high-frequency ﬁnancial data:ﬁtted models using easily interpretable covariates show an excellent agreement with empiricaldata. Extensive investigation on tick data consequently helps identifying trading signals andimportant factors determining the limit order book dynamics. We also illustrate the potentialuse of the framework for out-of-sample predictions.

Keywords : order book models; point processes; Cox processes; Hawkes processes; ratio models;trading signals; imbalance; spread.

The limit order book is the central structure that aggregates all orders submitted by market par-ticipants to buy or sell a given asset on a ﬁnancial market. Buy oﬀers form the bid side, sell oﬀers ∗ Laboratoire MICS et Chaire de Finance Quantitative, CentraleSup´elec, Bˆatiment Bouygues, 3 rue Joliot Curie,91190 Gif-sur-Yvette, France. [email protected] † Graduate School of Mathematical Sciences, University of Tokyo: 3-8-1 Komaba, Meguro-ku, Tokyo 153- 8914,Japan. [email protected] a r X i v : . [ q -f i n . S T ] A ug orm the ask side. Simpliﬁed representations of possible interactions with a limit order book usuallyconsider three archetypal sorts of orders : limit orders, market orders, and cancellations. Buy (resp.sell) limit orders are orders to buy (resp. sell) a given quantity of the asset with a given limit pricestrictly lower (resp. greater) than the current best ask (resp. bid) quote. Limit orders are storedin the book until matched or canceled. Buy (resp. sell) market orders are submitted without anylimit price, and are thus matched against the current best ask (resp. bid) orders. Cancellationsremove a non-executed limit order from the book.The rise of electronic markets, the accelerating rate of trading on ﬁnancial markets, the de-velopment of optimal trading strategies by brokers, etc., have pushed for better investigation andmodeling of limit order books. Chakraborti et al. (2011), Gould et al. (2013) and Abergel et al. (2016) provide some overview on recent modeling eﬀorts. Cont et al. (2010) has been a seminalmodel using Poisson processes. Huang et al. (2015) investigate Markovian dynamics dependenton the size of the queues of the limit order book. Muni Toke & Yoshida (2017) propose a state-dependent model that makes orders intensities depend on ﬁnancial signals such as the bid-askspread. A common diﬃculty in estimating these models is the necessity to cope with the factthat market activity is highly ﬂuctuating during the day, at all timescales. Daily market activity(for example measured in number or volume of trades, or in number or volume of all orders) isknown to globally exhibit a U-shaped pattern, with a lower activity in the middle of the day, and amuch higher activity in the morning after the market opening, and in the afternoon before marketclose. This seasonality eﬀect is non-smooth: exogenous news (announcements of company results,of acquisitions, of macroeconomic indicators, etc.) occur all day long, at random or predeterminedtimes, and may incur activity bursts. In Europe, openings of American markets are usually fol-lowed by some activity increase. It may be quite diﬃcult to incorporate such variations in a model.In order to avoid such problems, one may try to remove the most hectic parts of the samples,and/or focus on a limited time interval, and/or split trading days in several parts. Examples canbe found in, e.g., the growing literature on Hawkes processes in ﬁnance: Bacry et al. (2012) limitstests on their estimation procedure to two-hour periods to avoid seasonality eﬀects; Lallouache &Challet (2016) uses time-dependent piecewise-linear baseline intensity (with nodes spaced every fewhours). However, even at these scales, global market activity varies and may limit the reliability ofestimation procedures.In this work we continue previous investigations on the inﬂuences of ﬁnancial variables (stateof the order book or other trading signals) on the processes of order submissions. We assumethat orders submissions are modeled by point processes with Cox-like intensities depending ongiven covariates. However, we do not directly try to model and fully estimate each and everyintensities of the model, as in e.g. Muni Toke & Yoshida (2017), but we rather try to estimatethe relative inﬂuences of given covariates on the intensities. To this end we assume that theirexists a possibly random baseline intensity that is common to all processes under investigation,2nd that could for example incorporate the seasonality eﬀects and changing activities that we havedescribed. By dealing with ratios of intensities, we can then remove this baseline intensity fromour estimation procedure, and thereby obtain a model that focuses only on the ﬁnancial covariatesunder investigation.In Section 2 we describe the general model of ratios of intensities. In Section 3 we show thatthe quasi-likelihood estimators of the model are consistent and asymptotically normal. Section4 discusses the method with respect to information criteria and penalization. Finally, Section5 illustrates the beneﬁts of the model with several examples of limit order book analysis. Theratio model is able to reproduce empirical observations on trading signals in orders ﬂows suchas imbalance, spread, quantities available, etc. Estimation results on more than 30 stocks in theParis Stock Exchange for most of the year 2015 show a very good agreement between empiricalobservations and the proposed ratio model. Let I = { , . . . , ¯ i } and J = { , . . . , ¯ j } for some strictly positive integers ¯ i and ¯ j . Let ( N it ) t ≥ , i ∈ I , be some counting processes. We assume that the intensities λ i ( t ) , i ∈ I , of these counting processesshare a common baseline intensity λ ( t ). λ ( t ) is neither observable nor speciﬁed as a function ofobservables and parameters. For any i ∈ I , the intensity λ i ( t ) is written : λ i ( t, ϑ ) = λ ( t ) exp (cid:88) j ∈ J ϑ ij X j ( t )  , (1)where ( X j ( t )) t ≥ is the j -th observable covariate process and ϑ = ( ϑ ij ) i ∈ I ,j ∈ J a parameter vector.We are not interested in the value of the coeﬃcient ϑ ij , modeling the speciﬁc response of thecounting process N i to the covariate X j , but rather in the relative responses of the intensitiescompared to each other. Let θ ij = ϑ ij − ϑ j , ( i ∈ I , j ∈ J ) (2)be the relative response of process i compared to process 0 with respect to covariate j , j ∈ J .Obviously, ∀ j ∈ J , θ j = 0 since the process 0 is taken as an arbitrary reference, and ∀ j ∈ J , ∀ ( i, i (cid:48) ) ∈ I , θ i (cid:48) j − θ ij = ϑ i (cid:48) j − ϑ ij , i.e. the diﬀerences between absolute and relative responses are equal. Insteadof the standard intensities deﬁned at Equation (1), and since we are only interested in the relativeresponses, we will consider the intensities ratios r i ( t, θ ) = λ i ( t, ϑ ) (cid:80) i (cid:48) ∈ I λ i (cid:48) ( t, ϑ ) (3)where θ = ( θ , . . . , θ j , . . . , θ ¯ i , . . . , θ ¯ i ¯ j ) denotes the new parameter vector. Notation is justiﬁed by3he following computation: r i ( t, θ ) = exp (cid:16)(cid:80) j ∈ J ϑ ij X j ( t ) (cid:17)(cid:80) i (cid:48) ∈ I exp (cid:16)(cid:80) j ∈ J ϑ i (cid:48) j X j ( t ) (cid:17) =  exp  − (cid:88) j ∈ J ϑ ij X j ( t )  (cid:88) i (cid:48) ∈ I exp (cid:88) j ∈ J ϑ i (cid:48) j X j ( t )  − = (cid:88) i (cid:48) ∈ I exp (cid:88) j ∈ J ( ϑ i (cid:48) j − ϑ ij ) X j ( t )  − = (cid:88) i (cid:48) ∈ I exp (cid:88) j ∈ J ( θ i (cid:48) j − θ ij ) X j ( t )  − =  (cid:88) i (cid:48) ∈ I \{ i } exp (cid:88) j ∈ J ( θ i (cid:48) j − θ ij ) X j ( t )  − . (4)As hinted in the introduction, this framework is convenient to model a limit order book since dailymovements are known to exhibit both intraday seasonality and sudden bursts of activities, some-times in response to exogenous news, sometimes occurring for reasons less obvious to the outsideobserver. If we assume that such global market activity equally aﬀects all types of orders, then suchvariations of market activity are taken into account by the non speciﬁed, possibly stochastic, base-line intensity λ . Therefore, we can estimate relative responses θ of order ﬂows to speciﬁc covariateswithout having to account for the common intensity λ due to global market activity. If diﬀerentpatterns of background intensities are observed in the data, then it is possible to incorporate thesediﬀerences into covariates.One should also note that by construction (cid:80) i ∈ I r i ( t, θ ) = 1. This gives another importantfeature of the model, which is that the intensities ratios r i are directly interpretable in terms ofprobability. Let us assume that the counting process N i , i ∈ I , are counting the number of eventsof (mutually exclusive) type i occurring in the limit order book. Then the intensities ratio r i isthe instantaneous probability that the next occurring event will be of type i . Our ratio model canthus estimate relative event probabilities independently of the variations of market activity that areassumed to be shared by the intensities of all processes. Several examples are provided in Section5. In this section, we show that the quasi-maximum likelihood estimator and the quasi-Bayesianestimator of the ratio model of Equation (3) are consistent and asymptotically normal. Two4lose formulations are provided. In the ﬁrst one, stationarity of covariates is assumed to obtainconsistency and asymptotic normality. In the second one, this assumption is discarded, but itis assumed that the sample consists of repeated i.i.d. measurements to again obtain consistencyand asymptotic normality. This second formulation is in agreement with the common practice inempirical ﬁnance to glue several trading days, or parts of trading days, into one single sample.Let us introduce some notations used in the following analysis. For a tensor T = ( T i ,...,i k ) i ,...,i k ,we write T [ u , ..., u k ] = T [ u ⊗ · · · ⊗ u k ] = (cid:88) i ,...,i k T i ,...,i k u i · · · u i k k (5)for u = ( u i ) i ,..., u k = ( u i k k ) i k . Brackets [ , ..., ] stand for a multilinear mapping. We denote by u ⊗ r = u ⊗ · · · ⊗ u the r times tensor product of u . Let ι ( θ ) = ( j , θ ), where k = 0 ∈ R k . We willwrite λ i ( t, θ ) for λ i ( t, ι ( θ )) for notational simplicity. Furthermore we write θ i = ( θ ij ) j ∈ J for i ∈ I .Since θ = 0, we will use the notation I = I \ { } = { , . . . , ¯ i } and consider a bounded domainΘ ⊂ R p , with p = ¯ i × ¯ j as the parameter space of θ = ( θ ij ) i ∈ I ,j ∈ J . Let T ∈ R ∗ + . To estimate θ ∈ Θ based on the observations on [0 , T ], we consider the quasi-loglikelihood (log partial likelihood) H T ( θ ) = (cid:88) i ∈ I (cid:90) T log r i ( t, θ ) dN it . (6)Obviously, the model is continuously extended to the closure Θ, and H T is extended to there asa continuous function. A quasi-maximum likelihood estimator (QMLE) is a measurable mappingˆ θ MT : Ω → Θ satisfying H T (ˆ θ MT ) = max θ ∈ Θ H T ( θ ) (7)for all ω ∈ Ω. The quasi-Bayesian estimator (QBE) with respect to a prior density (cid:36) is deﬁned byˆ θ BT = (cid:20) (cid:90) Θ exp (cid:0) H T ( θ ) (cid:1) (cid:36) ( θ ) dθ (cid:21) − (cid:90) Θ θ exp (cid:0) H T ( θ ) (cid:1) (cid:36) ( θ ) dθ. (8)The QBE takes values in C [Θ], the convex hull of Θ. We assume (cid:36) is continuous and satisﬁes 0 < inf θ ∈ Θ (cid:36) ( θ ) ≤ sup θ ∈ Θ (cid:36) ( θ ) < ∞ . These estimators are called together quasi-likelihood estimators.We now investigate asymptotic properties of the estimators when T → ∞ in two cases.Let X = ( X j ) j ∈ J . In this ﬁrst step, in order to simplify the statements, we assume that theprocess ( λ , X ) is stationary. Denote by θ ∗ = (( ϑ ∗ ) ij − ( ϑ ∗ ) j ) i ∈ I ,j ∈ J the true value of θ , and assume5hat θ ∗ ∈ Θ. Denote by B I the σ -ﬁeld generated by { λ ( t ) , X ( t ); t ∈ I } for I ⊂ R + . Let α ( h ) = sup t ∈ R + sup A ∈B [0 ,t ] ,B ∈B [ t + h, ∞ ) (cid:12)(cid:12) P [ A ∩ B ] − P [ A ] P [ B ] (cid:12)(cid:12) (9)for h >

0. We consider the following conditions. [A1]

The process ( λ , X ) is stationary, λ (0) ∈ L ∞− = ∩ p> L p and exp( | X j (0) | ) ∈ L ∞− for all j ∈ J . [A2] The function α is rapidly decreasing, that is, lim sup h →∞ h L α ( h ) < ∞ for every L > ρ i ( x, θ ) = (cid:34)(cid:88) i (cid:48) ∈ I exp (cid:18) x (cid:2) θ i (cid:48) − θ i (cid:3)(cid:19)(cid:35) − (10)for x ∈ R j and θ = ( θ i ) i ∈ I ∈ R p . LetΛ( w, x ) = w (cid:88) i ∈ I exp (cid:0) x (cid:2) ϑ ∗ i (cid:3)(cid:1) (11)for w ∈ R + and x ∈ R j . Denote by V ( x, θ ) the variance matrix of the (1+ i )-dimensional multinomialdistribution M(1; π , π , ..., π i ) with π i = ρ i ( x, θ ), i ∈ I , and V ( x, θ ) α,α (cid:48) is V ( x, θ )’s ( α + 1 , α (cid:48) + 1)-element. Let V ( x, θ ) = ( V ( x, θ ) α,α (cid:48) ) α,α (cid:48) ∈ I . Deﬁne a symmetric tensor Γ byΓ[ u ⊗ ] = E (cid:20)(cid:18) V ( X (0)) ⊗ X (0) ⊗ (cid:19) [ u ⊗ ]Λ( λ (0) , X (0)) (cid:21) (12)for u ∈ R p , where V ( x ) = V ( x, θ ∗ ). The matrix Γ is nonnegative deﬁnite. We assume [A3] det Γ > θ ∗ ∈ Θ denote the true value of θ . Denote by C p ( R p ) the space of continuous functions on R p of at most polynomial growth. We write ˆ u MT = √ T (cid:0) ˆ θ MT − θ ∗ (cid:1) and ˆ u BT = √ T (cid:0) ˆ θ BT − θ ∗ (cid:1) . Denoteby ζ a p -dimensional standard Gaussian vector. The quasi-likelihood analysis ensures convergenceof moments as well as asymptotic normality of the quasi-likelihood estimators. Theorem 3.1.

Suppose that [ A , [ A and [ A are satisﬁed. Then E (cid:2) f (ˆ u A T ) (cid:3) → E (cid:2) f (Γ − / ζ ) (cid:3) (13) for any f ∈ C p ( R p ) and A ∈ { M, B } . Example 1.

Let H be a Hawkes process with exponential kernel with parameters ( µ, α, β ) ∈ ( R ∗ + ) .For the sake of illustrating Theorem 3.1, we consider a version of model (1) with ¯ i = 1, ¯ j = 1, andin which the common baseline intensity λ is the intensity of the process H . We thus have: λ i ( t, ϑ ) = (cid:20) µ + (cid:90) t αe − β ( t − s ) dH ( s ) (cid:21) exp (cid:0) ϑ i X ( t ) (cid:1) , i = 0 , , (14)where X is a Markov chain with states {− , } and inﬁnitesimal generator (cid:32) λ X − λ X − λ X λ X (cid:33) . Thenwe obviously have p = 1, i.e. θ = ϑ − ϑ is the single parameter to be ﬁt. In this speciﬁc case, onemay explicitly compute the asymptotic variance of Equation (12):Γ = µ − αβ e θ ∗ (1 + e θ ∗ ) (cid:2) cosh(( ϑ ∗ ) ) + cosh(( ϑ ∗ ) ) (cid:3) ∈ R ∗ + . (15)We run 1000 simulations of the model with Hawkes parameters µ = 0 . , α = 1 , β = 2, covariateparameter λ X = 0 .

5, and horizon T = 1000. True values of the parameters are ( ϑ ∗ ) = 0 . ϑ ∗ ) = − .

75 so that θ ∗ = 1 .

5. For each simulation we estimate θ . The empirical density of theestimated values of θ − θ ∗ is plotted on Figure 1 in dots and full line. On the same plot is providedin dashed lines a Gaussian distribution ﬁtted on these estimated values, as well as the theoreticalGaussian distribution given by Theorem 3.1, i.e. with the asymptotic variance given at Equation(12). Experimental values agree with the results of Theorem 3.1. Example 2.

This paper does not focus on the baseline intensity. However, in the case of a para-metric model where the baseline intensity is fully speciﬁed, then the ratio approach is particularlyhelpful as it reduces the dimension of the space of the parameters, and can thus help improvingnumerical estimations by choosing appropriate starting points for the optimization routines. As anillustration, let us consider a model similar to the previous example, with a Hawkes process H with¯ k = 2 exponential kernels as baseline intensity, ¯ i = 2, i.e. 3 processes, and ¯ j = 2 covariates (sameMarkov chains as in the previous example): λ i ( t, ϑ ) =  ¯ k (cid:88) k =1 (cid:90) t α k e − β k ( t − s ) dH ( s )  exp  ¯ j (cid:88) j =1 ϑ ij X j  . (16)Maximum likelihood estimators of this model are directly computable when the process H is ob-servable. This requires an optimization on 10 parameters. As an alternative estimation procedure,we can estimate the ratio model (4 parameters θ ij , i = 1 , j = 1 ,

2) and then estimate the fullmodel likelihood with the constraints that the diﬀerences ϑ ij − ϑ j are kept equal to the estimatedvalues θ ij (dimension 6). All 10 parameters are thus estimated. The estimated values can ﬁnally be7 l l l l l l l l l l l l l l l l −0.15 −0.10 −0.05 0.00 0.05 0.10 0.15 Estimators distribution T=1000

Error D en s i t y l Empirical densityGaussian fitTheoretical density

Figure 1: Distribution of the estimator of the estimation error θ − θ ∗ for the model deﬁned inExample 1. Asymptotic variance deﬁned at Equation (12) is retrieved experimentally.used as a starting point for a ﬁnal maximization of the likelihood of the full model, in order to en-sure that we keep the desired properties of the maximum likelihood estimators. Estimation resultson simulations of the model (16) are presented in Table 1. “Full” denotes the standard maximum α α β β ϑ ϑ ϑ ϑ ϑ ϑ True 1.000 2.000 2.000 10.000 0.500 1.000 0.500 -1.000 -0.500 1.000Full 0.942 1.507 2.131 5.647 0.499 1.011 0.517 -1.003 -0.504 1.015(0.819) (1.150) (0.878) (2.492) (0.013) (0.013) (0.049) (0.020) (0.004) (0.019)Combined 0.994 1.993 1.981 9.923 0.500 1.000 0.500 -1.000 -0.500 1.000(0.155) (0.154) (0.233) (1.034) (0.005) (0.005) (0.004) (0.005) (0.004) (0.005)

Table 1: Numerical results for the estimation of the 10 parameters of the model deﬁned at Equation(16). Standard deviations a given in parenthesis. See text for details.likelihood estimation. “Combined” denotes the mixed method involving a ratio estimation as a ﬁrststep. Optimizations are carried out with a Nelder-Mead algorithm (Python scipy implementation)with random starting points (using a standard Gaussian distribution). With these parameters andan horizon T = 10 000, samples have roughly 33 000 data points in average. Out of 197 tests, the“Full’ estimation converged only 10 times, while the “Combined” method converged 192 times outof 202. Estimates produced by the “Combined” method are closer to the true values and have asmaller empirical standard deviations by a factor 2 to 10 (except for ϑ , already well-estimated inthe “Full” case). Strong improvements are observed in the Hawkes parameters, where the standard“Full” method struggles to produce close estimates, while the “Combined” method starting with a8atio estimation yields much more accurate results. We complete the previous analysis with a formulation of our estimation result that may be con-venient in ﬁnance and in other ﬁelds of applications. We consider a sequence of observationsof intraday data. The observations may be non-ergodic each day. We shall consider intervals I ( k ) = [ O k , C k ] ( k ∈ N ) of the same length such that 0 ≤ O < C ≤ O < C ≤ · · · . We considerthe counting processes N i = ( N it ) t ∈ R + of the previous section. However, in this section, the obser-vations are (( N it ) i ∈ I , X ( t )) t ∈ I ( k ) , k = 1 , ..., T . As before, it is assumed that the point processes N i ( i ∈ I ) have no common jumps. The stationarity of each process (cid:0) λ i ( t, ϑ ∗ ) (cid:1) t ∈ I ( k ) is not assumedhere.We are interested in estimation of the parameter θ = ( θ ij ) i ∈ I , j ∈ J deﬁned earlier by Equation(2). The estimation will be based on the random ﬁeld H T re-deﬁned by H T ( θ ) = T (cid:88) k =1 (cid:88) i ∈ I (cid:90) I ( k ) log r i ( t, θ ) dN it . (17)Then the QMLE and the QBE are deﬁned by Equations (7) and (8), respectively, but for H T givenby Equation (17).Let X k = (cid:0) λ ( t ) , X ( t ) (cid:1) t ∈ I ( k ) . Let G k = σ [ X , ..., X k ] and let H k = σ [ X k , X k +1 , ... ]. The α -mixingcoeﬃcient for X = ( X k ) k ∈ N is deﬁned by α X ( h ) = sup k ∈ N sup A ∈G k , B ∈H k + h (cid:12)(cid:12) P [ A ∩ B ] − P [ A ] P [ B ] (cid:12)(cid:12) . (18)Let us then deﬁne the new conditions. [C1] The sequence ( X k ) k ∈ N is identically distributed. Moreover, sup t ∈ I (1) (cid:107) λ ( t ) (cid:107) p < ∞ andmax j ∈ J sup t ∈ I (1) (cid:107) exp( p | X j ( t ) | ) (cid:107) < ∞ for all p > [C2] For every

L >

0, lim sup h →∞ h L α X ( h ) < ∞ .Deﬁne the symmetric tensor Γ byΓ[ u ⊗ ] = E (cid:20) (cid:90) I (1) (cid:18) V ( X ( t )) ⊗ X ( t ) ⊗ (cid:19) [ u ⊗ ]Λ( λ ( t ) , X ( t )) dt (cid:21) (19)for u ∈ R p . The matrix Γ is nonnegative deﬁnite. More strongly we assume [C3] det Γ > heorem 3.2. Suppose that [ C , [ C and [ C are satisﬁed. Then E (cid:2) f (ˆ u A T ) (cid:3) → E (cid:2) f (Γ − / ζ ) (cid:3) (20) as T → ∞ for any f ∈ C p ( R p ) and A ∈ { M, B } . The proof is omitted.

Since the ratio model ﬂexibly incorporates various covariates processes and since it may have a largenumber of parameters, we need information criteria for model selection and other regularizationmethods for sparse estimation. Though the inference of the ratio model is based on the quasi-likelihood analysis, we can still apply information criterion like CAIC (consistent AIC) and BIC byusing H T . See Bozdogan (1987) for exposition of information criteria for model selection.Let a T be a sequence of positive numbers such that a T → ∞ and a T /T → T → ∞ . Let K ⊂ I × J . We consider a sub-model S K of Θ such that S K = { θ ∈ Θ; θ ij = 0 (( i, j ) ∈ K c ) } . (21)Let C T ( S K ) = − H T (ˆ θ K ) + d ( S K ) a T (22)where ˆ θ K (depending on T ) is denoting the QMLE or QBE in the sub-model S K and d ( S K ) is thedimension of S K . More precisely, the QMLE ˆ θ M K is deﬁned as an estimator that satisﬁes H T (ˆ θ M K ) = max θ ∈ S K H T ( θ ) , (23)and the QBE ˆ θ B K is deﬁned byˆ θ B K = (cid:20) (cid:90) S K exp (cid:0) H T ( θ ) (cid:1) (cid:36) S K ( θ ) (cid:21) − (cid:90) S K θ exp (cid:0) H T ( θ ) (cid:1) (cid:36) S K ( θ ) dθ (24)for a continuous prior density (cid:36) S K on S K satisfying 0 < inf θ ∈ S K (cid:36) S K ( θ ) ≤ sup θ ∈ S K (cid:36) S K ( θ ) < ∞ .We denote by S K ∗ the minimum model that includes θ ∗ , in other words, ( θ ∗ ) ij (cid:54) = 0 if and only if( i, j ) ∈ K ∗ . The QMLE and QBE for θ restricted to the sub-model S K ∗ are generically denoted byˆ θ K ∗ . In particular, C T ( S K ∗ ) = − H T (ˆ θ K ∗ ) + d ( S K ∗ ) a T . (25) Proposition 4.1.

Suppose that [ A , [ A and [ A are satisﬁed. Then i) If θ ∗ (cid:54)∈ S K , then C T ( S K ) − C T ( S K ∗ ) → p ∞ ( T → ∞ ) (26) (ii) If θ ∗ ∈ S K and S K (cid:54) = S K ∗ , then C T ( S K ) − C T ( S K ∗ ) → p ∞ ( T → ∞ ) (27)A proof of Proposition 4.1 is given in Appendix for selfcontainedness. According to Proposition4.1, we should select a model S (cid:98) K that attains min { C T ( S K ); K ⊂ I × J } . Then the selectionconsistency holds as P (cid:2) (cid:98) K = K ∗ (cid:3) → T → ∞ ) . (28)Based on the quasi-likelihood function H T , the criterion C T ( S K ) gives the quasi-consistent AIC(QCAIC) when a T = log T + 1, and the quasi-BIC (QBIC) when a T = log T among many otherpossible choices of a T . A Hannan and Quinn type of quasi-information criterion (QHQ) is the casewhere a T = c log log T for c >

2. The quasi-AIC (QAIC) is the case where a T = 2 but it cannotexclude overﬁtting though it excludes misspeciﬁed models, as usual and as suggested in the proofof Proposition 4.1.Recently penalized quasi-likelihood analysis for sparse estimation has been developed. Weconsider a penalty function p λ such that p λ ( x ) = p λ ( − x ), p λ (0) = 0, p λ is non-decreasing on x ≥ p λ is diﬀerentiable except for x = 0, and lim x ↓ x − q p λ ( x ) = λ > q ∈ (0 , H T . Then we have the penalized contrast function − H † T ( θ ) = − H T ( θ ) + √ T (cid:88) i ∈ I ,j ∈ J p λ ( θ ij ) (29)and the penalized QMLE ˆ θ λ (depending on T ) is associated byˆ θ λ ∈ argmin θ ∈ Θ (cid:8) − H † T ( θ ) (cid:9) . (30)In the case q = 1, the polynomial type large deviation inequality is inherited from H T to H † T , as aresult, we obtain asymptotic properties of ˆ θ λ as well as L p -boundedness of the error. Asymptoticdistribution becomes slightly involved. In a similar way, in the case q <

1, it is possible to deriveasymptotic properties of ˆ θ λ . A prominent property is selection consistency. Thanks to the QLAtheory having a polynomial type large deviation inequality, we can prove that the probability ofcorrect model selection is 1 − O ( T − L ) as T → ∞ for every L >

0. See Kinoshita & Yoshida(2018) for details. An advantage of working with the QLA theory is that asymptotic properties ofregularization methods are obtained in a uniﬁed manner without using any speciﬁc nature of the11tochastic process. The QLA was used in Umezu et al. (2015) for a generalized linear model.Another approach is toward the adaptive LASSO (Zou (2006)) and the least square approxi-mation method (Wang & Leng (2007)). De Gregorio & Iacus (2012) took this approach to sparseestimation of ergodic diﬀusions. Since we can assume strong mode of convergence for the initialestimator constructed within the QLA framework, we can obtain selection consistency with er-ror probability that is of O ( T − L ) for any L > et al. (2015) derived probabilistic inequalitiesfor multivariate point processes in the context of LASSO.

This section describes the high-frequency ﬁnancial data (Section 5.1) and several empirical studiesconducted with it. A detailed example of ﬁnancial analysis allowed by the ratio model is providedin Section 5.2, where the determination of the sign of the next trade in a limit order book is carriedout with several covariates. Other examples are provided in subsequent sections, illustrating theinﬂuence of the spread and queue sizes in the order book.

We use Reuters TRTH tick by tick data for 36 stocks traded on the Paris stock Exchange on mostof the year 2015. The list of the stocks is given in Appendix B. For each stock and each tradingday, the orders ﬂow is reconstructed using the method described in Muni Toke (2016), and we keepall trades and quotes recorded between 9:30 am and 5:00 pm . When adding all stocks and tradingdays, the sample represents more than one billion events (more than 50 millions market orders,more than 500 millions limit orders, and more than 450 millions cancellations).All models are numerically estimated using R by quasi-likelihood maximization. For practicalpurposes, we provide practical explicit expressions for the log partial likelihood. Using Equation Removing the beginning and the trading day has been done as a usual precautionary step when dealing withhigh-frequency trades and quotes databases, as one may sometimes be concerned with data quality in very busyperiods. However, this precaution may actually not be necessary. For example, when recomputing Figure 4 belowusing the whole trading day, no we observe no signiﬁcant visual diﬀerence with the version presented in this paper. θ as: H T ( θ ) = − (cid:88) i ∈ I (cid:90) T log (cid:88) i (cid:48) ∈ I exp (cid:88) j ∈ J ( θ i (cid:48) j − θ ij ) X j ( t )  dN i ( t )= − (cid:88) i ∈ I (cid:90) T log  (cid:88) i (cid:48) ∈ I \{ i } exp (cid:88) j ∈ J ( θ i (cid:48) j − θ ij ) X j ( t )  dN i ( t ) (31)Equation (17) is similarly written. In the example case I = { , , } with three counting processes,the quasi-log likelihood given at Equation (31) is written: H T ( θ ) = − (cid:90) T log  (cid:88) j ∈ J θ j X j ( t )  + exp (cid:88) j ∈ J θ j X j ( t )  dN ( t ) − (cid:90) T log   − (cid:88) j ∈ J θ j X j ( t )  + exp (cid:88) j ∈ J ( θ j − θ j ) X j ( t )  dN ( t ) − (cid:90) T log   − (cid:88) j ∈ J θ j X j ( t )  + exp (cid:88) j ∈ J ( θ j − θ j ) X j ( t )  dN ( t ) (32)An important feature of the model may be underlined here. Many mathematical models oflimit order books, and models in high-frequency ﬁnance in general, rely on simple point processes,i.e. processes for which one may not observe two simultaneous events. Confronting such modelingto empirical data thus often requires that the timestamps of all recorded events are diﬀerent. Thiscondition, even with resolutions down to milliseconds or even microseconds, is not veriﬁed onﬁnancial markets, where burst of activities are translated into data ﬁles by multiple events at thesame timestamp. Randomization, i.e. adding a small random quantity to the observed timestamps,is often proposed as a patch to force all timestamps to be diﬀerent. A nice feature of the settingproposed here is that such patches are not necessary : likelihood (31) does not depend on intervaltimes between consecutive events, but only on the number of events. Therefore, the model can beapplied even on data with low resolution timestamps. It is well known that imbalance is good proxy to the sign of the next trade. Empirical observationscan be found in e.g. Lipton et al. (2013). Lehalle & Mounjid (2017), among others, is an attemptto incorporate this signal in trading strategies. Let us deﬁne the imbalance in the order bookobserved at time t by: i ( t ) = q B ( t ) − q A ( t ) q B ( t ) + q A ( t ) , (33)13here q A ( t ) and q B ( t ) are the number of shares available at time t at the best quote on the ask sideand bid side respectively. An imbalance close to +1 indicates a very small available volume on theask side and consequently a probable upward price move, while an imbalance close to − N MA and bid market orders N MB are point processes with intensities : λ MA ( t, ϑ MA ) = λ ( t ) exp (cid:2) ϑ MA + ϑ MA i ( t ) (cid:3) ,λ MB ( t, ϑ MB ) = λ ( t ) exp (cid:2) ϑ MB + ϑ MB i ( t ) (cid:3) , (34)where λ ( t ) is a potentially random baseline intensity. Parameters θ i = θ MAi − θ MBi , i = 0 , r MA and r MB are thendirectly interpretable as the instantaneous probability that the next market order will occur on theask side and on the bid side respectively.The model is ﬁtted monthly for all stocks, from January 2015 to November 2015, which, ex-cluding gaps in the database, gives 390 ﬁts of the model. On Figure 2, we plot for two samplesthe empirical probability that for a given imbalance, the next trade occurs on the ask side, andthe numerical estimation of the theoretical probability r MA ( i ) = [1 + exp( − θ − θ i )] − for a level i of imbalance given by the ﬁt of our ratio model. For representativity we rank the 390 ﬁts byincreasing mean L distance between empirical and ﬁtted probabilities, and we select two samplesrepresenting the 20% and 80% quantiles of this error distribution. All 390 ﬁts are available uponrequest.The simple ratio model with one covariate provides a very good ﬁt of the probability of the nexttrade. But we can obviously investigate further possible factors inﬂuencing the sign of the nexttrade. It has been shown that times series of signs of trades ( (cid:15) ( t ) = +1 for an ask trade at time t , − et al. (2004)). We can include the sign of the last trade in the ratio model, andassume that counting processes of ask market orders N MA and bid market orders N MB are pointprocesses with intensities : λ MA ( t, ϑ MA ) = λ ( t ) exp (cid:2) ϑ MA + ϑ MA i ( t ) + ϑ MA (cid:15) ( t ) (cid:3) ,λ MB ( t, ϑ MB ) = λ ( t ) exp (cid:2) ϑ MB + ϑ MB i ( t ) + ϑ MB (cid:15) ( t ) (cid:3) , (35)Figure 3 plots the updated results for two samples. For representativity, we again select the twosamples representing the 20% and 80% quantile measured in mean L error. All 390 ﬁts areavailable upon request. It is interesting to observe the strong eﬀect of the sign of the last trade onthe imbalance predicting power. While a close to zero imbalance unconditionally indicate a close14 l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l P r o b a b ili t y o f a s k m a r k e t CAPP.PA 201511 l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l P r o b a b ili t y o f a s k m a r k e t DANO.PA 201503

Figure 2: Empirical probability (in red) and ﬁtted probability (in blue) that the next market orderis an ask market order, given the observed imbalance. Selected stocks show the 20% (left) and80%(right) quantiles measured in mean L error. l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l P r o b a b ili t y o f a s k m a r k e t AXAF.PA 201502 l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l P r o b a b ili t y o f a s k m a r k e t SCHN.PA 201502

Figure 3: Empirical probability (in red) and ﬁtted probability (in blue) that the next market orderis an ask market order, given the observed imbalance. Upward (resp. downward) triangles indicatethat the last trade was an ask (resp. bid) market order. Selected stocks show the 20% (left) and80%(right) quantiles measured in mean L error.15o 50% probability of an ask trade (see Figure 2), this dramatically changes when the sign of thelast trade comes into play. We now observe on Figure 3 two probability curves, one for each signof the preceding trade, and these curves highlight a kind of hysteresis eﬀect in sign determinationwith respect to the imbalance. For example, during a series of bid (resp. ask) market orders, theimbalance level at which an ask (resp. bid) market order becomes more probable is not around 0,but higher (resp.lower). On the selected graphs the 50% crossing point for the imbalance level isaround ± . (cid:15) . We can thus use the following ratio model : λ MA ( t, ϑ MA ) = λ ( t ) exp (cid:2) ϑ MA + ϑ MA i ( t ) + ϑ MA (cid:15) ( t ) + ϑ MA (cid:15) ( t ) s ( t ) (cid:3) ,λ MB ( t, ϑ MB ) = λ ( t ) exp (cid:2) ϑ MB + ϑ MB i ( t ) + ϑ MB (cid:15) ( t ) + ϑ MA (cid:15) ( t ) s ( t ) (cid:3) , (36)where s ( t ) = +1 if the observed spread is larger than its mean, and − s ( t ) = − s ( t ) = +1 as the large spread case. Figure 4 plots theupdated results for two samples. For representativity, we again select the two samples representingthe 20% and 80% quantiles measured in mean L error. All 390 ﬁts are available upon request. Wenow have four curves representing the eﬀect of the imbalance on the next trade sign, dependingon the last trade sign and the current spread. Empirical curves are noisier as the samples aresplit into subsamples to compute conditional probabilities. However, we observe as expected that alarge spread ﬂattens the imbalance eﬀect and reinforces the last sign importance. On the providedexample ﬁts, if the spread is larger than usual and the last trade was a bid, an ask market orderbecomes more probable than a bid market order when the imbalance nearly reaches 1, comparedto roughly 0 . a T , we use the QAIC for a T = 2, the QCAIC for a T = log T + 1, and the QBIC for a T = log T , all based on the QMLEˆ θ K . As an illustration, Table 2 (left panel) shows the number of times various models are selectedamong the 390 samples, according to these information criteria. Several comments can be madeabout this illustration. First of all, it turns out that the simplest model depending only on theimbalance is never selected. Recall for example that the weighted mid-price, commonly used in16 l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l P r o b a b ili t y o f a s k m a r k e t SCHN.PA 201510 l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l P r o b a b ili t y o f a s k m a r k e t VLOF.PA 201501

Figure 4: Empirical probability (in red) and ﬁtted probability (in blue) that the next market orderis an ask market order, given the observed imbalance. Upward (resp. downward) triangles indicatethat the last trade was an ask (resp. bid) market order. Dashed (resp. full) lines represent large(resp. usual) spread values. Selected stocks show the 20% (left) and 80%(right) quantiles measuredin mean L error. d Model QAIC QCAIC QBIC QAIC QCAIC QBIC2 i i.(cid:15) i.(cid:15).s i.(cid:15).(cid:15)s

79 162 158 48 118 1147 i.(cid:15).s + int 310 225 229 218 175 17811 i.(cid:15).s.δ + int — — — 123 94 95Table 2: Number of times a model for the intensity of bid and ask market orders is selected amongthe 390 samples, according to several information criteria. Models are named by the names of thecovariates separated with a dot, with the notations deﬁned in the text. “+int” means that allinteraction terms are included in the model. E.g., “ i.(cid:15).(cid:15)s ” is the model deﬁned at Equations (36).Left panel investigates covariates i, (cid:15) and s only, while right panel adds the last price δ .17icrostructure as a proxy for a “future” price, depends only on the imbalance. Our observationpleads for the incorporation of more covariates in deﬁning such proxies. It also highlights that,as explained above, the spread acts primarily on the intensity of submission in interaction withthe last sign: model with covariates i, (cid:15) and s is (nearly) never selected, in contrast to the modelwith covariates i, (cid:15) and (cid:15)s . Finally, we observe as expected that QAIC has a tendency to select themodel with the greatest number of parameters ; however, using QCAIC or QBIC, the simple modelof Equation (36) is still selected on more than 40% of the samples over the full model with allterms. These results show that in future works further investigations could be made in determiningrelevant covariates and possibly analyze a stock- or time- dependency. Among several possibilities,Table 2 (right panel) gives selection results when we add the last price movement (denoted δ ) toratio model.Our investigation on the inﬂuence of the imbalance signal on the intensities of bid and askmarket orders can also illustrate the use of the penalized QMLE described in Section 4. In thisexample, we use the ratio model to try to decide whether traders use the imbalance as a tradingsignal on a given stock looking at the ﬁrst level only, or at the ﬁrst two levels, etc. Let us denote i k ( t )the imbalance observed a time t computed using the cumulative quantities at the best quotes up tothe k -th level, k = 1 , . . . , i is thus the imbalance previously investigated. For k = 2 , . . . ,

10, let∆ i k = i k − i k − the corrective term between the imbalances computed with k and k − λ T ( t, θ T ) = λ ( t ) exp (cid:34) θ T + θ T i ( t ) + (cid:88) k =2 θ Tk ∆ i k ( t ) (cid:35) , (37)where T ∈ { M B, M A } . Let θ k = θ MAk − θ MBk , k = 1 ,

10 be our ratio parameters, to be estimatedby likelihood maximization. We estimate the model using both standard quasi-likelihood maxi-mization, as well as a penalized estimation using the penarized QMLE ˆ θ λ of Equations (29) and(30). We compute daily ﬁts of the model and plot the mean values of the estimated parametersfor each stock. Figure 5 shows the ﬁtted parameters θ k as a function of the level k , as well as the95% Gaussian empirical conﬁdence interval computed with the empirical standard deviation. Forbrevity only two stocks are shown, and for simplicty we choose the same ones as in Figure 4 butresults for all 36 stocks are similar. It turns out that the parameters associated to the imbalance atlevel 1 is always very signiﬁcant, but that the additional information given by the corrective terms∆ i k , k = 2 , . . . ,

10 is not translated into signiﬁcant θ k , k = 2 , . . . ,

10. Penalization greatly helpsreducing the conﬁdence intervals of estimates. This results thus seem to be in favor of arguing thattraders use imbalance at the ﬁrst level as a signiﬁcant trading signal, but do not signiﬁcantly useinformation of higher levels. 18 l ll ll ll ll ll ll ll ll ll −202 1 2 3 4 5 6 7 8 9 10LOB level (1=best quote) Y e a r l y m e a n t h e t a Estimation ll PenalizedStandard

SCHN.PA − Relative theta w.r.t. LOB level ll ll ll ll ll ll ll ll ll ll −4−2024 1 2 3 4 5 6 7 8 9 10LOB level (1=best quote) Y e a r l y m e a n t h e t a Estimation ll PenalizedStandard

VLOF.PA − Relative theta w.r.t. LOB level

Figure 5: Mean daily estimate (curves) and 95% conﬁdence interval (shaded areas) of θ k for theimbalance model with corrective terms of Equation (37). Same stocks as in Figure 4. In thesenumerical examples, we use q = 1 and λ = 50 T − . Spread as a potential trading signal seems less investigated in the microstructure literature thanother factors. One potential reason is that the spread of many highly traded stocks is almost alwaysequal to one tick (so-called large tick stocks). This property has given birth to speciﬁc models ofsuch limit order books assuming a constant spread. However, in the studied sample of 36 stockstraded on the Paris stock exchange in 2015, spread cannot be assume equal to one. As such, it mustbe considered a possible trading signal, inﬂuencing the order ﬂows. If the observed spread is large,then a trader wanting to buy a share would rather submit a bid limit order inside the spread thanan ask market order. That way the trader will secure priority for the next sell market order, whileobtaining a lower price. A market order should only be used by an impatient trader, or a traderbelieving in an incoming and lasting upward market movement. On the contrary, a spread equal toone tick prevents the use of limit orders to secure priority in future executions. Such mechanismshave been observed in Muni Toke & Yoshida (2017) where data analysis shows that the empiricalintensity of market orders increases when the spread decreases to one tick.We can therefore propose a version of our ratio model to investigate the inﬂuence of the spreadon orders ﬂows occurring at the best quotes. Let us consider a model of best quotes of the limitorder book where market orders, limit orders and cancellations are submitted with intensities λ M , λ L and λ C respectively, all sharing an unobserved baseline intensity λ ( t ) and depending on the19bserved spread s ( t ) ∈ N ∗ (i.e expressed in number of ticks) : λ M ( t, ϑ M ) = λ ( t ) exp (cid:2) ϑ M + ϑ M log s ( t ) + ϑ M (log s ( t )) (cid:3) , (38) λ L ( t, ϑ L ) = λ ( t ) exp (cid:2) ϑ L + ϑ L log s ( t ) + ϑ L (log s ( t )) (cid:3) , (39) λ C ( t, ϑ C ) = λ ( t ) exp (cid:2) ϑ C + ϑ C log s ( t ) + ϑ C (log s ( t )) (cid:3) . (40)Letting θ Lj = ϑ Lj − ϑ Mj and θ Cj = ϑ Cj − ϑ Mj , j = 0 , ,

2, our intensities ratio model is easily estimatedby maximizing the quasi-log likelihood.We estimate the model for each stock and each trading days, which gives after data cleaning 8052diﬀerent samples and associated model ﬁts. As in the previous cases, we compute the intensitiesratios to estimate the probabilities of each event given the observed spread. Figure 6 plots theprobabilities of market orders, limit orders and cancellations occurring at the best quotes given theobserved spread, and compare them to the empirical probabilities computed on the sample. Againfor the sake of brevity and representativity we rank the ﬁts by increasing mean L error betweenmodel and data, and select four stocks representing the 20%, 40%, 60% and 80% quantiles of theerror distribution. It is satisfying to observe that the model provide good ﬁts in a wide rangeof spread distribution, from a large tick stock (Figure 6, bottom left) where the spread is almostalways equal to one tick, to the more interesting case of small tick stocks (e.g., Figure 6, top right),where the probability is correctly estimated up to 9 ticks and more. We propose a ﬁnal illustration of the ratio model to investigate the role of the observed queuesize in determining the ﬂows of limit orders and cancellations. Some empirical observations ofthe role of the queue size can be observed in Huang et al. (2015) as well as in Muni Toke &Yoshida (2017). A basic equilibrium argument suggests that when a queue size at a given price issmall, then providing liquidity with limit orders is interesting as it secures a relative priority for thetrader. On the contrary, when a queue size is large, then cancellations should be more frequent thanlimit orders. Furthermore, such an argument should be “more” valid inside the book, where onlylimit and market orders are allowed, than closer to the best quotes, where market orders removeimportant portions of liquidity and other trading signals, such as the spread and the imbalancepreviously investigated, play a signiﬁcant role.We investigate these insights by building one simple ratio model for limit orders and cancella-tions occurring inside the book, from level 2 to 10. Level 1 (best quote) is left aside as its dynamicsalso involves market orders. For each level α ∈ { , . . . , } , let N L α and N C α be the counting20 l l l l l ll l l l l l ll l l l l l ll l l l l l ll l l l l l ll l l l l l l P r o b a b ili t y llllll llllll llllll Market orders Limit orders Cancel orders

TECF.PA 20150324 l l l l l l l l ll l l l l l l l ll l l l l l l l ll l l l l l l l ll l l l l l l l ll l l l l l l l l P r o b a b ili t y llllll llllll llllll Market orders Limit orders Cancel orders

STM.PA 20150520 l l ll l ll l ll l ll l ll l l P r o b a b ili t y llllll llllll llllll Market orders Limit orders Cancel orders

LVMH.PA 20150908 l l l ll l l ll l l ll l l ll l l ll l l l P r o b a b ili t y llllll llllll llllll Market orders Limit orders Cancel orders

SAF.PA 20150603

Figure 6: Empirical probability (dashed lines) and ﬁtted probability (full lines) that the next orderis a market order (red), a limit order (blue) or an cancellation (green), given the observed spread. x -axis spans more than 99% of the empirical spread distribution, and the shaded area on the rightsignal the 95% and 99% quantiles of this distribution. Selected samples represent the 20% (topleft), 40% (top right), 60% (bottom left) and 80% (bottom right) quantiles of the mean L errordistribution among the 8052 tested samples. 21rocesses of limits orders and cancellations occurring at this level, with intensities: λ L α ( t, θ L α ) = λ ( t ) exp (cid:104) θ L α + θ L α log q α ( t ) + θ L α (log q α ( t )) (cid:105) ,λ C α ( t, θ C α ) = λ ( t ) exp (cid:104) θ C α + θ C α log q α ( t ) + θ C α (log q α ( t )) (cid:105) . (41)where λ ( t ) is a potentially random baseline intensity and q α ( t ) is the volume standing in the bookat the level of submission in group α and time t . q α ( t ) is expressed in integer multiple of the mediantrade size (see e.g., Huang et al. (2015); Muni Toke & Yoshida (2017) about this normalization).Let θ αj = θ L α j − θ C α j , j = 0 , α ∈ { , . . . , } , be the relative parameters of the ratio model, to beestimated by the maximization of quasi-log likelihood.As in the previous examples, we estimate the model for each stock and each month, and thencompute the intensities ratios to estimate the probabilities of each event given the observed volumeof the level of submission. Each monthly ﬁt gives 9 ﬁts, one for each level, but for the sake ofreadability, we plot only three levels: level 2 at the head of the book, level 5 at mid-depth and level8 at the back of the book. Figure 7 plots the probabilities of limit orders occurring inside the bookat these three levels, given the observed volume of the level, and compare them to the empiricalprobabilities computed on the sample. We show four examples of ﬁts (again selected by computingthe quantiles of the mean L error of the ﬁts), but all monthly ﬁts are quite similar and available.It is interesting that the probabilities of occurrence of a limit order as a function of the observedvolume show similar patterns across stocks. The model obviously recovers the expected equilibriumproperty that decreases the interest (hence the probability) of a limit order when liquidity is alreadyhere (i.e. q α is large). Furthermore, the use of a common x -axis makes the the probability curvesreﬂect the general shape of a limit order book. Using a rule of thumb that the equilibrium size ofeach level should be roughly at the crossing of the . et al. (2002)): the mid-book (level 5) is fatter than the head (level 2) and tail (level 8) of the book. Empirical results from Section 5 are in-sample analysis aimed at providing new descriptions of somedependencies observed on ﬁnancial markets. We now turn to the possible use of the ratio modelas a prediction tool. We consider the problem of the sign of the next trade, previously introducedin Section 5.2. The ratio model has helped us model how this sign depends on the state of theorder book, through e.g., the imbalance, the spread or the last trade sign. Time series of tradesigns are known to be highly auto-correlated, and to be well described by Hawkes processes. In thissetting, market orders can be described by a two-dimensional Hawkes process Z = ( Z B , Z A ) with22 llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll P r o b a b ili t y t h a t n e x t o r d e r i s li m i t v s c a n c e l Levels lll

AIRP.PA 201511 l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l P r o b a b ili t y t h a t n e x t o r d e r i s li m i t v s c a n c e l Levels lll

SGOB.PA 201504 l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l P r o b a b ili t y t h a t n e x t o r d e r i s li m i t v s c a n c e l Levels lll

SAF.PA 201511 l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l P r o b a b ili t y t h a t n e x t o r d e r i s li m i t v s c a n c e l Levels lll

BOUY.PA 201509

Figure 7: Empirical probability (dots and dashed lines) and ﬁtted probability (lines) that the nextorder is a limit order, given the observed level of the queue of submission. x -axis spans more than99% of the empirical volume distribution (at all levels), and the shaded area on the right signal the95% and 99% quantiles of this distribution. 23xponential kernels, its conditional intensity λ Z = ( λ Z B , λ Z A ) being written in vector notations as λ Z ( t ) = µ Z + (cid:90) t K ( t − s ) dZ s , (42)where µ Z = ( µ Z B , µ Z A ) is the baseline intensity and the kernel matrix K ( t ) = (cid:32) α BB e − β BB t α BA e − β BA t α AB e − β AB t α AA e − β AA t (cid:33) (43)describes the self- and cross- excitation parts.A simple ratio model with only current observations of the state and not taking into accountthe history of the order ﬂow (except for the last trade sign) can probably not compete with theHawkes description. However, we can incorporate some history into covariates of the ratio model.One may for example consider the following covariates: H A ( t ) = log (cid:18) µ A + (cid:90) t α A e − β A ( t − s ) dN As (cid:19) , (44) H B ( t ) = log (cid:18) µ B + (cid:90) t α B e − β B ( t − s ) dN Bs (cid:19) , (45)where ( N B , N A ) counts the number of bid and ask market orders. These covariates add someself-exciting history into the ratio model.We can thus examine diﬀerent methods to predict the trade sign, given that one trade isobserved a given time. This is of course a theoretical exercise, as we predict the trade sign with allthe information available just before its occurrence, not taking into account latency, informationdelays, reaction times, etc., that would hinder the performances in a more realistic setting. We testseven methods for this exercise: • Last : the trade sign is set to be the same as the last observed trade sign ; • Imbalance : the trade sign is set to − • Hawkes Full : at each time we compute the intensity λ Z of the Hawkes process described atEquations (42)-(43), given the observed history, and set the sign to be − λ Z B > λ Z A , +1otherwise. • Hawkes NoCross : same as the previous method, except that we independently ﬁt two Hawkesprocesses for the bid and ask market orders, i.e. we set α BA = α AB = 0; • Ratio ( i, (cid:15), (cid:15)s ) : we use the probabilities given by the basic ratio model of Equation (36) andset the sign to be − .

5, and +1 otherwise ; • Ratio ( H B , H A ) : same as the previous method, but we use only the covariates ( H B , H A ) inthe ratio model, instead of the covariates describing the state of the book ; • Ratio ( H B , H A , i, (cid:15), (cid:15)s ) : same as the previous method, but using both ( H A , H B ) and thecovariates ( i, (cid:15), (cid:15)s ) in the ratio model.We test these methods on the sample described at Section 5.1. “Last” and “Imbalance” methodsdo no require any calibration. For the other methods, calibration is carried out on the tradingday preceding the day at which prediction performance is evaluated. Models “Hawkes Full” and“Hawkes NoCross” are ﬁtted with a maximum-likelihood estimation (see e.g. Ogata (1978), Ozaki(1979) and later Bowsher (2007), Bacry et al. (2013) or Muni Toke & Pomponio (2012) in a ﬁnancialsetting). The parameters ( µ A , α A , β A ) and ( µ B , α B , β B ) used to compute the covariates H A and H B are obtained by the same method, i.e. the estimated values of the parameters ( µ A , α A , β A )used to compute H A correspond to the estimated values ( µ Z A , α AA , β AA ) of the “Hawkes NoCross”model. Note that the preceding day in the sample means Friday for Mondays, and possibly someother previous day in case of missing trading days in the data. Recall that this sample representsroughly 54 millions trades.Results are illustrated in Figure 8. For all stocks, the “Last” indicator correctly predicts morethan 80% of the trade signs, e.g. less than one trade out of 5 has a sign diﬀerent from the preceding25igure 9: Fraction of correctly predicted trade sign per method and per stock, averaged on allavailable trading days.trade. The “Imbalance” indicator is less performant, signing correctly about 70% −

80% of thetrades. The model “Ratio ( i, (cid:15), (cid:15)s )” of Section 5.2 builds on these two indicators to improve thesigning performance to about 85%. Hawkes models perform as the “Imbalance”: the performanceof the “Hawkes Full” method lies in the range 73% − − H B , H A ), the “Ratio ( H B , H A )” method actually mimicks the Hawkes process,the two performances curves being very close. The advantage of the ratio model is the possibilityto consider both the history covariates ( H B , H A ) and the state covariates ( i, (cid:15), (cid:15)s ). The model“Ratio ( H B , H A , i, (cid:15), (cid:15)s )” turns out to deliver the best prediction of the trade sign, improving theperformance of the “Ratio ( i, (cid:15), (cid:15)s )” model by a bit more than 2 .

5% in average on all the stock-days.We can analyse a bit further these performances. We now focus on the prediction of a signchange, i.e. we compute the performance using only the trades that have a sign diﬀerent from theprevious trade (roughly 20% of the sample, given the observation above). Results are averaged foreach stock on Figure 9. The performance of the “Imbalance” signal is roughly similar. The “Last”indicator has obviously a performance equal to 0, as it never predicts a sign change. The ratiomodel without any history also performs poorly (around 40%) in this speciﬁc subsample, since itsonly history relies on the last trade sign. Both Hawkes model are also less performant, always even26orse than the basic “Imbalance” signal. The interesting point is that in this speciﬁc subsample,the combination of the Hawkes covariates and the state covariates ( i, (cid:15), (cid:15)s ) described in Section 5.2signiﬁcantly improves the Hawkes models and the “Ratio ( i, (cid:15), (cid:15)s )” model. The average performanceincrease on all the stock-days of the sub-sample for the “Ratio ( H B , H A , i, (cid:15), (cid:15)s )” model with respectto the best Hawkes model is more than 13%. In brief, trade signs are quite well represented byHawkes processes, but the addition of a state-dependency by means of a ratio model allows a bettertracking of the sign changes, explaining an overall better performance. We have presented a model based on ratios of Cox-type intensities sharing a common, possiblyrandom, baseline intensity. Consistency and asymptotic normality of the estimators have beenproved. Such a model may be very useful in cases where one tries to investigate the role ofgiven covariates on point processes in a ﬂuctuating environment, but in which (some of) theseﬂuctuations are assumed to equally inﬂuence all the processes under scrutiny. This is thereforea plausible framework in ﬁnance, where global market activity varies wildly during the tradingday. The proposed setting removes this common baseline intensity from the estimation procedure,allowing to focus on the role of covariates. Using this method we have in particular been able tohighlight how signals such as market imbalance, bid-ask spread and sizes of limit order book queuesdo inﬂuence trading activity. This framework may hopefully be helpful in many other studies inhigh-frequency ﬁnance. Several directions of research and applications may indeed be followed,among which model selection and the identiﬁcation of signiﬁcant trading signals.

Acknowledgements

This work was in part supported by Japan Science and Technology Agency CREST JPMJCR14D7;Japan Society for the Promotion of Science Grants-in-Aid for Scientiﬁc Research No. 17H01702(Scientiﬁc Research), No. 26540011 (Challenging Exploratory Research); and by a CooperativeResearch Program of the Institute of Statistical Mathematics.

A Mathematical proofs

A.1 Proof of Theorem 3.1

We have r i ( t, θ ) = ρ i ( X ( t ) , θ ) ( i ∈ I , θ ∈ Θ) (46)and λ i ( t, ϑ ∗ ) = ρ i ( X ( t ) , θ ∗ )Λ( λ ( t ) , X ( t )) . ( i ∈ I ) (47)27imple calculus yields ∂ θ α log ρ i ( x, θ ) = (cid:8) δ α,i − ρ α ( x, θ ) (cid:9) x ( i ∈ I , α ∈ I , θ ∈ R p ) (48)and ∂ θ α (cid:48) ∂ θ α log ρ i ( x, θ ) = − (cid:8) δ α (cid:48) ,α ρ α ( x, θ ) − ρ α (cid:48) ( x, θ ) ρ α ( x, θ ) (cid:9) x ⊗ ( i ∈ I ; α (cid:48) , α ∈ I , θ ∈ R p ) (49)where θ α = ( θ αj ) j ∈ J . In other words, ∂ θ α (cid:48) ∂ θ α log ρ i ( x, θ ) = − V ( x, θ ) α (cid:48) ,α x ⊗ ( i ∈ I ; α (cid:48) , α ∈ I , θ ∈ R p ) . (50)For the closed convex hull C [Θ] of Θ, let Y ( θ ) = E (cid:20) (cid:88) i ∈ I log ρ i ( X (0) , θ ) ρ i ( X (0) , θ ∗ ) ρ i ( X (0) , θ ∗ )Λ( λ (0) , X (0)) (cid:21) ( θ ∈ C [Θ]) (51)Then Y ( θ ) ≤ θ ∈ C [Θ]) (52)and the equality holds if and only if ρ i ( X (0) , θ ) = ρ i ( X (0) , θ ∗ ) λ (0) dP - a.e. ( ∀ i ∈ I ) . (53)Condition (53) implies thatexp (cid:0) X (0)[ θ i ] − X (0)[ θ ∗ i ] (cid:1) = exp (cid:0) X (0)[ θ ] − X (0)[ θ ∗ ] (cid:1) = 1 λ (0) dP - a.e. ( ∀ i ∈ I ) (54)due to θ = θ ∗ = 0. Therefore, E (cid:20) V ( X (0)) i,i (cid:0) X (0) (cid:2) θ i − θ ∗ i (cid:3)(cid:1) Λ( λ (0) , X (0)) (cid:21) = 0 ( ∀ i ∈ I ) (55)and hence θ i = θ ∗ i for all i ∈ I due to Condition [A3] applied to u = (cid:0) δ i,i (cid:48) ( θ ij − θ ∗ ij ) (cid:1) i (cid:48) ∈ I ,j ∈ J .We see Γ = − ∂ θ Y ( θ ∗ ). Since Γ is positive deﬁnite and Y ( θ ) (cid:54) = 0 for all θ ∈ Θ \ { θ ∗ } , there existsa constant χ > Y ( θ ) = Y ( θ ) − Y ( θ ∗ ) ≤ − χ (cid:12)(cid:12) θ − θ ∗ (cid:12)(cid:12) (56)for all θ ∈ C [Θ].Let U T = { u ∈ R p ; θ † T ( u ) ∈ Θ } , where θ † T ( u ) = θ ∗ + T − / u . The quasi-likelihood ratio random28eld is deﬁne by Z T ( u ) = exp (cid:0) H T ( θ ∗ + T − / u ) − H T ( θ ∗ ) (cid:1) ( u ∈ U T ) . (57)We will use also Z T ( u ) = exp (cid:0) H T ( θ ∗ + T − / u ) − H T ( θ ∗ ) (cid:1) ( u ∈ C [ U T ]) . (58)For this extension, we notice r i ( t, θ ) and hence H T is naturally extended to C [ U T ]. The randomﬁeld Z T will be used to obtain the so-called polynomial type large deviation inequality for Z T .Deﬁne Y T by Y T ( θ ) = T − (cid:0) H T ( θ ) − H T ( θ ∗ ) (cid:1) ( θ ∈ C [Θ]) (59)with the naturally extended H T . Deﬁne a p -dimensional random variable ∆ T and a p × p randommatrix Γ T by ∆ T = T − / ∂ θ H T ( θ ∗ ) (60)and Γ T = − T − ∂ θ H T ( θ ∗ ) (61)respectively. Let Γ T ( θ ) = − T − ∂ θ H T ( θ ) for θ ∈ C [Θ]. ThenΓ T ( θ )[ u ⊗ ] = 1 T (cid:90) T (cid:18) V ( X ( t ) , θ ) ⊗ X ( t ) ⊗ (cid:19) [ u ⊗ ] (cid:88) i ∈ I dN it . (62)Take parameters α , β , ρ and ρ so that0 < β < , < ρ < min (cid:26) , β, β − α (cid:27) , < α < ρ , − ρ > , (63)where β = α/ (1 − α ). We have Lemma A.1.

Suppose that [ A and [ A are satisﬁed. Let p be any positive number. Then (i) sup T > (cid:13)(cid:13) ∆ T (cid:13)(cid:13) p < ∞ . (ii) sup T > (cid:13)(cid:13)(cid:13)(cid:13) sup θ ∈C [Θ] T (cid:0) Y T ( θ ) − Y ( θ ) (cid:1)(cid:13)(cid:13)(cid:13)(cid:13) p < ∞ . (iii) sup T > (cid:13)(cid:13)(cid:13)(cid:13) T − sup θ ∈C [Θ] | ∂ θ H T ( θ ) (cid:12)(cid:12)(cid:13)(cid:13)(cid:13)(cid:13) p < ∞ . (iv) sup T > E (cid:13)(cid:13) T β | Γ T − Γ | (cid:13)(cid:13) p < ∞ .Proof. Let ρ ( x, θ ) = (cid:0) ρ i ( x, θ ) (cid:1) i ∈ I . Let e i = ( δ α,i ) α ∈ I for i ∈ I . Deﬁne g i ∈ R i ⊗ R j ( i ∈ I ) by g i ( x, θ ) = (cid:0) { i (cid:54) =0 } e i − ρ ( x, θ ) (cid:1) ⊗ x ( i ∈ I ) (64)29hen ∆ T = T − / (cid:88) i ∈ I (cid:90) T g i ( X ( t ) , θ ∗ ) d ˜ N it , (65)where ˜ N it = N it − (cid:90) t λ i ( s, ϑ ∗ ) ds ( i ∈ I ) (66)We have λ i ( t, θ ) ≤ λ ( t ) (cid:88) i ∈ I exp (cid:0) X ( t )[ ϑ ∗ i ] (cid:1) , | g i ( X ( t ) , θ ) | ≤ | X ( t ) | (67)and | log r i ( t, θ ) | = | log ρ i ( X ( t ) , θ ) | ≤ C i (1 + | X ( t ) || θ | ) (68)for some constant C i depending on i .By the Burkholder-Davis-Gundy inequality, we obtain E (cid:20)(cid:12)(cid:12)(cid:12)(cid:12) T − / (cid:90) T h ( X ( t )) d ˜ N it (cid:12)(cid:12)(cid:12)(cid:12) k (cid:21) ≤ C k E (cid:20)(cid:12)(cid:12)(cid:12)(cid:12) T − (cid:90) T h ( X ( t )) dN it (cid:12)(cid:12)(cid:12)(cid:12) k (cid:21) ≤ C k E (cid:20)(cid:12)(cid:12)(cid:12)(cid:12) T − (cid:90) T h ( X ( t )) d ˜ N it (cid:12)(cid:12)(cid:12)(cid:12) k (cid:21) + C k E (cid:20) T − (cid:90) T (cid:12)(cid:12) h ( X ( t )) (cid:12)(cid:12) k λ i ( t, ϑ ∗ ) k dt (cid:21) ≤ C k E (cid:20)(cid:12)(cid:12)(cid:12)(cid:12) T − (cid:90) T h ( X ( t )) d ˜ N it (cid:12)(cid:12)(cid:12)(cid:12) k (cid:21) + C k E (cid:20)(cid:12)(cid:12) h ( X (0)) (cid:12)(cid:12) k λ i (0 , ϑ ∗ ) k (cid:21) (69)for any k ∈ N and any measurable function h of at most polynomial growth. By Equation (65) andinduction, we obtain (i).Set M T ( θ ) = (cid:88) i ∈ I T − (cid:90) T log ρ i ( X ( t ) , θ ) ρ i ( X ( t ) , θ ∗ ) d ˜ N it (70)and K T ( θ ) = T − (cid:90) T f ( λ ( t ) , X ( t ) , θ ) dt, (71)where f ( w, x, θ ) = (cid:88) i ∈ I log ρ i ( x, θ ) ρ i ( x, θ ∗ ) ρ i ( x, θ ∗ )Λ( w, x ) − E (cid:20) (cid:88) i ∈ I log ρ i ( X (0) , θ ) ρ i ( X (0) , θ ∗ ) ρ i ( X (0) , θ ∗ )Λ( λ (0) , X (0)) (cid:21) . (72)30hen Y T ( θ ) − Y ( θ ) = M T ( θ ) + K T ( θ ) . (73)Similarly to the proof of (i), we obtain (cid:88) k =0 , sup θ ∈C [Θ] sup T > (cid:107) T / ∂ kθ M T ( θ ) (cid:107) p < ∞ (74)for every p ≥

2. Moreover, applying Theorem 6.3 of Rio (2017) under [ A

1] and [ A (cid:88) k =0 , sup θ ∈C [Θ] sup T > (cid:107) T / ∂ kθ K T ( θ ) (cid:107) p < ∞ (75)for every p ≥

2. Now Sobolev’s embedding inequality in W ,p ( C [Θ]) (cid:44) → C ( C [Θ]) for p > p ∨ Lemma A.2.

Suppose that [ A - [ A are satisﬁed. Then (i) For any

L > , sup T > sup r> r L P (cid:20) sup u ∈ V T ( r ) Z T ( u ) ≥ exp (cid:0) − − r − ( ρ ∨ ρ ) (cid:1)(cid:21) < ∞ , (76) where V T ( r ) = { u ∈ U T ; | u | ≥ r } . (ii) Z T admits a locally asymptotically normal representation Z T ( u ) = exp (cid:18) ∆ T [ u ] −

12 Γ[ u ⊗ ] + r T ( u ) (cid:19) (77) with r T ( u ) → p as T → ∞ for every u ∈ R p and ∆ T → d N p (0 , Γ) as T → ∞ .Proof. We will verify the conditions of Theorem 3 (c) of Yoshida (2011) for the naturally extendedrandom ﬁeld H T over C [Θ]. Condition [ A (cid:48)(cid:48) ] therein holds according to Lemma A.1 (iii) and (iv).Condition [ A (cid:48) ] therein is satisﬁed with β = 0 under Equation (63). Lemma A.1 (i) and (ii)ensures Condition [ A

6] therein. Condition [ B

1] therein is obvious and Condition [ B

2] therein is(56). Therefore, by Theorem 3 of Yoshida (2011), we obtainsup

T > sup r> r L P (cid:20) sup u ∈C [ U T ]: | u |≥ r Z T ( u ) ≥ exp (cid:0) − − r − ( ρ ∧ ρ ) (cid:1)(cid:21) < ∞ , (78)which gives (i). 31he term r T ( u ) is deﬁned by (77) for u ∈ U T . For each u ∈ R p , for suﬃciently large T , r T ( u )admits the representation r T ( u ) = (cid:90) (1 − s ) (cid:26) Γ[ u ⊗ ] − Γ T ( θ † T ( su ))[ u ⊗ ] (cid:27) ds. (79)Then Lemma A.1 (iii) and (iv) verify the convergence r T ( u ) → p T → ∞ .For u = ( u ij ) i ∈ I ,j ∈ J and x = ( x j ) j ∈ J , (cid:88) i ∈ I (cid:26) (cid:88) i ∈ I ,j ∈ J (cid:0) δ i,i − ρ i ( x, θ ) (cid:1) x j u i j (cid:27) ρ i ( x, θ )= (cid:88) i ,i ∈ I j ,j ∈ J (cid:26) (cid:88) i ∈ I (cid:0) δ i,i − ρ i ( x, θ ) (cid:1)(cid:0) δ i,i − ρ i ( x, θ ) (cid:1) ρ i ( x, θ ) (cid:27) x j x j u i j u i j = (cid:88) i ,i ∈ I j ,j ∈ J (cid:26) δ i ,i ρ i ( x, θ ) − ρ i ( x, θ ) ρ i ( x, θ ) (cid:27) x j x j u i j u i j = (cid:0) V ( x, θ ) ⊗ x ⊗ (cid:1) [ u ⊗ ] . (80)Since it is assumed that N i ( i ∈ I ) do not have common jumps, (cid:28) T − / (cid:88) i ∈ I (cid:90) · g i ( X ( t ) , θ ∗ )[ u ] d ˜ N it (cid:29) T = T − (cid:88) i ∈ I (cid:90) T (cid:0) g i ( X ( t ) , θ ∗ )[ u ] (cid:1) ρ i ( X ( t ) , θ ∗ )Λ( λ ( t ) , X ( t )) dt = T − (cid:90) T (cid:18) V ( X ( t ) , θ ) ⊗ X ( t ) ⊗ (cid:19) [ u ⊗ ]Λ( λ ( t ) , X ( t )) dt. (81)Under [ A

1] and [ A u ⊗ ] as T → ∞ .The conditional type Lindeberg condition is easily veriﬁed by dividing the range [0 , T ] of the integral(65) into (cid:100) T (cid:101) subintervals, and as a result, we see ∆ T → d N p (0 , Γ), which concludes the proof of(ii).It is possible to extend Z T to R p so that the extension has a compact support andsup u ∈ R p \ U T Z T ( u ) ≤ max u ∈ ∂ U T Z T ( u ) . (82)We will denote this extended random ﬁeld by the same Z T . Then Z T is a random variable takingvalues in the Banach space ˆ C = { f ∈ C ( R p ); lim | u |→∞ f ( u ) = 0 } equipped with sup-norm. On32ome probability space, we prepare a random ﬁeld Z ( u ) = exp (cid:18) ∆[ u ] −

12 Γ[ u ⊗ ] (cid:19) ( u ∈ R p ) (83)where ∆ ∼ N p (0 , Γ). By Lemma A.2 (ii), we have a ﬁnite-dimensional convergence Z T → d f Z (84)as T → ∞ .For δ > c >

0, deﬁne w T ( δ, c ) by w T ( δ, c ) = sup (cid:26)(cid:12)(cid:12) log Z T ( u ) − log Z T ( u ) (cid:12)(cid:12) ; u , u ∈ B c , | u − u | ≤ δ (cid:27) (85)for large T , where B c = { u ∈ R p ; | u | ≤ c } . Then by Lemma A.1 (iii) and the deﬁnition of Z T , wehave lim δ ↓ lim sup T →∞ P (cid:2) w T ( δ, c ) > (cid:15) (cid:3) = 0 (86)for every (cid:15) > c > u MT since Conditions [ C C

3] therein are ensured by (86) and by (84), respectively, and Condition [ C

2] is trivial now.The properties (86) and (84) give functional convergence Z T | B c → d Z T | B c in C ( B c ) (87)as T → ∞ for each c >

0. Moreover, Lemma A.2 already provided the PLD inequality for Z T .Thus e.g. Theorem 10 of Yoshida (2011) proves (13) for ˆ u BT once the estimatesup T > E (cid:20)(cid:18) (cid:90) U T Z T ( u ) du (cid:19) − (cid:21) < ∞ (88)is established. However, (88) can be veriﬁed e.g. with Lemma 2 of Yoshida (2011). This completesthe proof of Theorem 3.1. 33 .2 Proof of Proposition 4.1 Suppose that θ ∗ (cid:54)∈ S K . Then1 T (cid:0) C T ( S K ) − C T ( S K ∗ ) (cid:1) = − T (cid:0) H T (ˆ θ K ) − H T ( θ ∗ ) (cid:1) + 2 T (cid:0) H T (ˆ θ K ∗ ) − H T ( θ ∗ ) (cid:1) + a T T (cid:0) d ( S K ) − d ( S ∗ K ) (cid:1) ≥ χ inf θ ∈ S K (cid:12)(cid:12) θ − θ ∗ (cid:12)(cid:12) + o p (1) (89)by Lemma A.1 (ii) and (56) applied to the sub-models S K and S K ∗ in place of Θ. Then (i) holdssince inf θ ∈ S K (cid:12)(cid:12) θ − θ ∗ (cid:12)(cid:12) > θ ∗ ∈ S K and S K (cid:54) = S K ∗ . Obviously, d ( S K ) > d ( S K ∗ ). Denote by Γ K thematrix consisting of the elements of Γ with indices in K . Then [ A

3] implies det Γ K >

0. Thus, wecan obtain the same results as in Theorem 3.1 for ˆ θ K . In particular,2 (cid:0) H T (ˆ θ K ) − H T ( θ ∗ ) (cid:1) = Γ K (cid:2) (ˆ u K ) ⊗ (cid:3) + o p (1) = O p (1) (90)for both QMLE and QBE for the model S K since θ ∗ ∈ S K , where ˆ u K = √ T (ˆ θ K − θ ). This is alsovalid for the model S K ∗ with ˆ θ K ∗ . Therefore, C T ( S K ) − C T ( S K ∗ ) = O p (1) + (cid:0) d ( S K ) − d ( S K ∗ ) (cid:1) a T → p ∞ (91)as T → ∞ , which completes the proof. B List of stocks

Table 3 lists all the stocks investigated in the paper.34

IC Company Sector Number of tradingdays in sampleAIRP.PA Air Liquide Healthcare / Energy 239BNPP.PA BNP Paribas Banking 224EDF.PA Electricite de France Energy 237LAGA.PA Lagard`ere Media 143CARR.PA Carrefour Retail 230BOUY.PA Bouygues Construction / Telecom 229ALSO.PA Alstom Transport 230ACCP.PA Accor Hotels 228ALUA.PA Alcatel Networks / Telecom 235AXAF.PA Axa Insurance 237CAGR.PA Cr´edit Agricole Banking 236CAPP.PA Cap Gemini Technology Consulting 233DANO.PA Danone Food 230ESSI.PA Essilor Optics 229LOIM.PA Klepierre Finance 222LVMH.PA Louis Vuitton Mo¨et Hennessy Luxury 234MICP.PA Michelin Tires 230OREP.PA L’Or´eal Cosmetics 234PERP.PA Pernod Ricard Spirits 225PEUP.PA Peugeot Automotive 152PRTP.PA Kering Luxury 228PUBP.PA Publicis Communication 224RENA.PA Renault Automotive 229SAF.PA Safran Aerospace / Defense 233TECF.PA Technip Energy 226TOTF.PA Total Energy 233VIE.PA Veolia Energy / Environment 235VIV.PA Vivendi Media 235VLLP.PA Vallourec Materials 229VLOF.PA Valeo Automotive 222SASY.PA Sanoﬁ Healthcare 230SCHN.PA Schneider Electric Energy 225SGEF.PA Vinci Construction 230SGOB.PA Saint Gobain Materials 235SOGN.PA Soci´et´e G´en´erale Banking 230STM.PA ST Microelectronics Semiconductor 228

Table 3: List of stocks investigated in this paper. Sample consists of the whole year 2015, repre-senting roughly 230 trading days for all stocks except LAGA.PA and PEUP.PA which are missingroughly 70 trading days. 35 eferences

Abergel, Fr´ed´eric, Anane, Marouane, Chakraborti, Anirban, Jedidi, Aymen, & Muni Toke, Ioane.2016.

Limit order books . Cambridge University Press, Dehli.Bacry, Emmanuel, Dayri, Khalil, & Muzy, Jean-Fran¸cois. 2012. Non-parametric kernel estimationfor symmetric Hawkes processes. Application to high frequency ﬁnancial data.

The EuropeanPhysical Journal B-Condensed Matter and Complex Systems , (5), 1–12.Bacry, Emmanuel, Delattre, Sylvain, Hoﬀmann, Marc, & Muzy, Jean-Fran¸cois. 2013. Modellingmicrostructure noise with mutually exciting point processes. Quantitative ﬁnance , (1), 65–77.Bouchaud, Jean-Philippe, M´ezard, Marc, Potters, Marc, et al. . 2002. Statistical properties of stockorder books: empirical results and models. Quantitative ﬁnance , (4), 251–256.Bouchaud, Jean-Philippe, Gefen, Yuval, Potters, Marc, & Wyart, Matthieu. 2004. Fluctuations andresponse in ﬁnancial markets: the subtle nature of ‘random’price changes. Quantitative ﬁnance , (2), 176–190.Bowsher, C. G. 2007. Modelling security market events in continuous time: Intensity based, mul-tivariate point process models. Journal of Econometrics , , 876–912.Bozdogan, Hamparsum. 1987. Model selection and Akaike’s information criterion (AIC): Thegeneral theory and its analytical extensions. Psychometrika , (3), 345–370.Chakraborti, Anirban, Muni Toke, Ioane, Patriarca, Marco, & Abergel, Fr´ed´eric. 2011. Econo-physics review: I. Empirical facts. Quantitative Finance , (7), 991–1012.Cont, Rama, Stoikov, Sasha, & Talreja, Rishi. 2010. A stochastic model for order book dynamics. Operations research , (3), 549–563.De Gregorio, Alessandro, & Iacus, Stefano M. 2012. Adaptive LASSO-type estimation for multi-variate diﬀusion processes. Econometric Theory , (4), 838–860.Fan, Jianqing, & Li, Runze. 2001. Variable selection via nonconcave penalized likelihood and itsoracle properties. Journal of the American statistical Association , (456), 1348–1360.Fan, Jianqing, & Li, Runze. 2002. Variable selection for Cox’s proportional hazards model andfrailty model. Annals of Statistics , (1), 74–99.Frank, Lldiko E., & Friedman, Jerome H. 1993. A statistical view of some chemometrics regressiontools. Technometrics , (2), 109–135.Gould, Martin D, Porter, Mason A, Williams, Stacy, McDonald, Mark, Fenn, Daniel J, & Howison,Sam D. 2013. Limit order books. Quantitative Finance , (11), 1709–1742.36ansen, Niels Richard, Reynaud-Bouret, Patricia, Rivoirard, Vincent, et al. . 2015. Lasso andprobabilistic inequalities for multivariate point processes. Bernoulli , (1), 83–143.Huang, Weibing, Lehalle, Charles-Albert, & Rosenbaum, Mathieu. 2015. Simulating and analyzingorder book data: The queue-reactive model. Journal of the American Statistical Association , (509), 107–122.Kinoshita, Yoshiki, & Yoshida, Nakahiro. 2018. Penalized quasi-likelihood estimation for variableselection. preprint .Lallouache, Mehdi, & Challet, Damien. 2016. The limits of statistical signiﬁcance of Hawkesprocesses ﬁtted to ﬁnancial data. Quantitative Finance , (1), 1–11.Lehalle, Charles-Albert, & Mounjid, Othmane. 2017. Limit order strategic placement with adverseselection risk and the role of latency. Market Microstructure and Liquidity , (01), 1750009.Lillo, Fabrizio, & Farmer, J Doyne. 2004. The long memory of the eﬃcient market. Studies innonlinear dynamics & econometrics , (3), 1–33.Lipton, A., Pesavento, U., & Sotiropoulos, M. G. 2013. Trade arrival dynamics and quote imbalancein a limit order book. arxiv preprint arxiv:1312.0514 .Muni Toke, Ioane. 2016. Reconstruction of Order Flows using Aggregated Data. Market Mi-crostructure and Liquidity , (02), 1650007.Muni Toke, Ioane, & Pomponio, Fabrizio. 2012. Modelling trades-through in a limited order bookusing Hawkes processes. Economics -journal , , 22.Muni Toke, Ioane, & Yoshida, Nakahiro. 2017. Modelling intensities of order ﬂows in a limit orderbook. Quantitative Finance , (5), 683–701.Ogata, Yosihiko. 1978. The asymptotic behaviour of maximum likelihood estimators for stationarypoint processes. Annals of the Institute of Statistical Mathematics , (1), 243–261.Ozaki, T. 1979. Maximum likelihood estimation of Hawkes’ self-exciting point processes. Annalsof the Institute of Statistical Mathematics , (1), 145–155.Rio, Emmanuel. 2017. Asymptotic Theory of Weakly Dependent Random Processes . ProbabilityTheory and Stochastic Modelling. Springer, Berlin.Stoikov, Sasha. 2017. The Micro-Price: A High Frequency Estimator of Future Prices.

SSRNpreprint .Suzuki, Takumi, & Yoshida, Nakahiro. 2018. Penalized least squares approximation methods andtheir applications to stochastic processes. arXiv:1811.09016 .37ibshirani, Robert. 1996. Regression shrinkage and selection via the lasso.

Journal of the RoyalStatistical Society. Series B (Methodological) , (1), 267–288.Umezu, Yuta, Shimizu, Yusuke, Masuda, Hiroki, & Ninomiya, Yoshiyuki. 2015. AIC for non-concavepenalized likelihood method. arxiv preprint arxiv:1509.01688 .Wang, Hansheng, & Leng, Chenlei. 2007. Uniﬁed LASSO estimation by least squares approxima-tion. Journal of the American Statistical Association , (479), 1039–1048.Yoshida, Nakahiro. 2011. Polynomial type large deviation inequalities and quasi-likelihood analysisfor stochastic diﬀerential equations. Annals of the Institute of Statistical Mathematics , (3),431–479.Yue, Yu Ryan, & Loh, Ji Meng. 2015. Variable selection for inhomogeneous spatial point processmodels. Canadian Journal of Statistics , (2), 288–305.Zou, Hui. 2006. The adaptive lasso and its oracle properties. Journal of the American statisticalassociation ,101