[PDF] Multivariate General Compound Point Processes in Limit Order Books

Abstract

In this paper, we focus on a new generalization of multivariate general compound Hawkes process (MGCHP), which we referred to as the multivariate general compound point process (MGCPP). Namely, we applied a multivariate point process to model the order flow instead of the Hawkes process. Law of large numbers (LLN) and two functional central limit theorems (FCLTs) for the MGCPP were proved in this work. Applications of the MGCPP in the limit order market were also considered. We provided numerical simulations and comparisons for the MGCPP and MGCHP by applying Google, Apple, Microsoft, Amazon, and Intel trading data.

Full PDF

MM ULTIVARIATE G ENERAL C OMPOUND P OINT P ROCESSES IN L IMIT O RDER B OOKS

A P

REPRINT

Qi Guo

Department of Mathematics and StatisticsUniversity of CalgaryUniversity Drive NWCalgary, Canada T2N 1N4 [email protected]

Bruno Remillard

Department of Decision SciencesHEC Montréal3000, chemin de la Cote-Sainte-CatherineMontréal, Canada H3T 2A7 [email protected]

Anatoliy Swishchuk

Department of Mathematics and StatisticsUniversity of CalgaryUniversity Drive NWCalgary, Canada T2N 1N4 [email protected]

August 4, 2020 A BSTRACT

In this paper, we focus on a new generalization of multivariate general compound Hawkes process(MGCHP), which we referred to as the multivariate general compound point process (MGCPP).Namely, we applied a multivariate point process to model the order ﬂow instead of the Hawkesprocess. Law of large numbers (LLN) and two functional central limit theorems (FCLTs) for theMGCPP were proved in this work. Applications of the MGCPP in the limit order market were alsoconsidered. We provided numerical simulations and comparisons for the MGCPP and MGCHP byapplying Google, Apple, Microsoft, Amazon, and Intel trading data. K eywords Point process (PP); multivariate point processes (MPP); multivariate general compound point processes(MGCPP); limit order books (LOB); Functional Central Limit Theorems (FCLT); Law of Large Numbers (LLN)

In this paper we study multivariate general compound point processes to model the price processes in the limit orderbooks (LOB). We prove a Law of Large Numbers and Functional Central Limit Theorems (FCLT) for these processes.The latter two FCLTs are applied to limit order books where we use these asymptotic methods to study the link betweenprice volatility and order ﬂow in our two models by using the diffusion limits of these price processes. The volatilitiesof price changes are expressed in terms of parameters describing the arrival rates and price changes.Bacry et al. (2013) proved a LLN and FCLT for multivariate HP [1]. Bowsher (2007) was the ﬁrst who applied a HPand point processes to ﬁnancial data modelling [2]. Bauwens and Hautsch (2009) use a 5-D HP to estimate multivariatevolatility, between ﬁve stocks, based on price intensities [5]. We note, that Brémaud et al. (1996) generalized the HPto its nonlinear form [3]. Also, a functional central limit theorem for the nonlinear HP was obtained in [28]. Someapplications of multivariate HP to ﬁnancial data are given in [13]. Vinkovskaya (2014) considered a point processmodel for the dynamics of LOB, and a regime-switching HP to model its dependency on the bid-ask spread in limitorder books [24]. A semi-Markov process was applied to LOB in [21] to model the mid-price. We note, that a level-1limit order books with time dependent arrival rates λ ( t ) were studied in [8], including the asymptotic distribution of theprice process. General semi-Markovian models for limit order books were considered in [22]. The book by Cartea et a r X i v : . [ q -f i n . M F ] J u l PREPRINT - A

UGUST

4, 2020 al. (2015) develops models for algorithmic trading in contexts such as executing large orders, market making, tradingpairs or collecting of assets, and executing in dark pool [7]. That book also contains link to a website from which manydatasets from several sources can be downloaded, and MATLAB code to assist in experimentation with the data. Adetailed description of the mathematical theory of Hawkes processes is given in [16]. Zheng et al. (2014) introduced amultivariate point process describing the dynamics of the Bid and Ask price of a ﬁnancial asset [27]. The point processis similar to a Hawkes process, with additional constraints on its intensity corresponding to the natural ordering of thebest Bid and Ask prices. Eichler et al. (2017) has shown that the Granger causality structure of multivariate HP is fullyencoded in the corresponding link functions of the model [12]. A new nonparametric estimator of the link functionsbased on a time-discretized version of the point process was introduced by using an inﬁnite order autoregression.Consistency of the new estimator was derived. The estimator was applied to simulated data and to neural spike traindata from the spinal dorsal horn of a rat. Chen et al. (2019) developed a new approach for investigating the propertiesof the HP without the restriction to mutual excitation or linear link functions [9]. They employed a thinning processrepresentation and a coupling construction to bound the dependence coefﬁcient of the HP. Using recent developmentson weakly dependent sequences, a concentration inequality for second-order statistics of the HP was established. Thisconcentration inequality was applied to cross-covariance analysis in the high-dimensional regime, and it was veriﬁedthe theoretical claims with simulation studies [9]. Lemonnier et al. presented a framework for ﬁtting multivariateHP for large-scale problems, both in the number of events in the observed history n and the number of event types d (i.e. dimensions) [15]. Liniger (2009) thesis addresses theoretical and practical questions arising in connection withmultivariate, marked, linear HP [16]. Yang et al. (2017) developed a nonparametric and online learning algorithmthat estimates the triggering functions of a multivariate HP [26]. [18] has shown that multivariate Hawkes processescoupled with the nonparametric estimation procedure can be successfully used to study complex interactions betweenthe time of arrival of orders and their size observed in a limit order book market. This methodology was applied tohigh-frequency order book data of futures traded at EUREX. Introduction to point processes from a martingale point ofview may be found in Bjork (2011) lecture notes [4].Guo et al. (2020) constructed a multivariate general compound Hawkes process (MGCHP) [14] which is an extendedmodel from [10] and [20]. In [14], they applied the multivariate Hawkes process to model the order ﬂow of severalstocks in limit order market and proved limit theorems for the MGCHP. In this paper, we proposed a new mid-pricemodel which is a generalization of the MGCHP and we called it the multivariate general compound point process(MGCPP). For the MGCPP, we applied a multi-dimensional simple point process to represent the order ﬂow in LOBinstead of the Hawkes process. We also proved the corresponding LLN and FCLTs for the MGCPP. One of the reasonswhy we considered the generalized model is parameters for simple point process are much easier to estimate thanHawkes process. So, we provided the numerical comparisons of the MGCPP and MGCHP by real high-frequencytrading data and we found that results of the new generalized model are as good as the MGCHP.This paper is organized as follows. Deﬁnition and assumptions of the multivariate general compound point process(MGCPP) can be found in Section 2. Functional central limit theorem (FCLT) I and law of large numbers were provedin Section 3. We also provided numerical examples simulated by real data for the FCLT I in Section 3. In Section 4, weconsidered a FCLT II for the MGCPP and applied it in the mid-price prediction. Section 5 concludes the paper. In this Section, we proposed a multivariate stochastic model for the mid-price in the limit order book. This is ageneralization for models in [10], [14], and [20]. Here, we assume the order ﬂow was described by a multivariatesimple point process with some good asymptotic properties.

Deﬁnition 2.1 (Counting Process). (see, eg., [11]): We called a stochastic process { N ( t ) , t ≥ } counting process if itsatisﬁes N ( t ) ≥ , N (0) = 0 , N ( t + s ) ≥ N ( t ) , for all t, s ≥ , and N ( t ) is an integer. Deﬁnition 2.2 (Point Process). (see, eg., [11]): Let ( T , T , T , · · · ) be a sequence of non-negative random variableswith P (0 ≤ T ≤ T ≤ T ≤ · · · ) = 1 , and the number of points in a bounded region is almost surely ﬁnite, then ( T , T , T , · · · ) is called a point process. The point process was characterized by the conditional intensity function λ ( t ) in the form of λ ( t ) = lim h → E [ N ( t + h ) − N ( t ) |F N ( t )] h , (1)where λ ( t ) is a non-negative function and F N ( t ) , t > is the corresponding natural ﬁltration.2 PREPRINT - A

UGUST

4, 2020

Let (cid:126)N t = ( N ,t , N ,t , · · · , N d,t , ) be d -dimensional point process with following assumptions: Assumption 2.0.1

We assume there’s a law of large numbers (LLN) of the (cid:126)N t in the form of: (cid:126)N ( nt ) n → (cid:126) ¯ λt (2) as n → + ∞ almost-surely, where (cid:126) ¯ λ = (¯ λ , ¯ λ , ¯ λ , · · · , ¯ λ d ) . Assumption 2.0.2

We also assume there’s a Functional Central Limit Theorem (FCLT) of the (cid:126)N t in the form of: √ n ( (cid:126)N nt − E ( (cid:126)N nt )) , t ∈ [0 , (3) converge in law of the Skorohod topology to Σ / (cid:126)W t as n → ∞ , where (cid:126)W t is a standard d -dimensional Brownianmotion and Σ is in the form of: Σ = diag( σ , σ , σ , · · · , σ d ) . Here, (cid:126)N t denotes the order ﬂow in the limit order market for d stocks. Liquidity for the high-frequency trading dataguarantee there are enough price changes in one day or even a small window size nt . So, it is resealable to considerthose two limit assumptions before. Remark 2.1

For a simple example, if we consider the point process as a multivariate homogeneous Poisson process,then two assumptions above are LLN and FCLT for the multi-dimensional Poisson process. Let (cid:126)P t be a d -dimensionalPoisson process with intensity (cid:126)λ . Here, we used notation (cid:126)P t to distinguish the general case and Poisson example. Then,we have the LLN in the form of sup t ∈ [0 , (cid:13)(cid:13)(cid:13) n − (cid:126)P nt − t(cid:126)λ (cid:13)(cid:13)(cid:13) → (4) as n → ∞ almost-surely. And the FCLT in the form of √ n (cid:18) n (cid:126)P nt − t(cid:126)λ (cid:19) converge in law for the Skorokhod topology to (cid:126)W t ◦ (cid:126)λ / as n → ∞ , where ◦ is the element-wise product. Remark 2.2

Another interesting example is the multivariate Hawkes process (MHP). Let (cid:126)H t = ( H ,t , H ,t , · · · , H d,t ) be a d -dimensional Hawkes process with the intensity function for each H i in the form of λ i ( t ) = λ i + (cid:90) (0 ,t ) d (cid:88) j =1 µ ij ( t − s ) dH j,s , (5) Let µ = ( µ ij ) ≤ i,j ≤ d , (cid:126)λ = ( λ , λ , · · · , λ d ) T , and K = (cid:82) ∞ µ ( t ) dt , then the LLN for MHP is in the form of sup t ∈ [0 , (cid:13)(cid:13)(cid:13) n − (cid:126)H nt − t ( I − K ) − (cid:126)λ (cid:13)(cid:13)(cid:13) → (6) as n → ∞ almost-surely, where I is a d -dimensional identity matrix. And we can also have the FCLT for MHP: √ n ( (cid:126)H nt − E ( (cid:126)H nt )) , t ∈ [0 , converge in law of the Skorohod topology to ( I − K ) − D / (cid:126)W t as n → ∞ , where (cid:126)W t is a standard d -dimensionalBrownian motion and D is a diagonal matrix such that D ii = (( I − K ) − (cid:126)λ ) i . Details about the LLN and FCLT ofMHP can be found in [1]. PREPRINT - A

UGUST

4, 2020

Next, we consider a price process (cid:126)S t in the form (cid:126)S t = ( S ,t , S ,t , · · · , S d,t , ) as: S i,t = S i, + N i,t (cid:88) k =1 a i ( X i,k ) , (7)where X i,k are independent ergodic continuous-time Markov chains and a i ( · ) are bounded continuous functions on X .We refer S t as multivariate general compound point processes (MGCPP). Remark 2.3

If we consider the one-dimensional case, let N t be a Poisson process, a ( x ) = x , and X k is a sequenceof independent random variables such that P ( X = δ ) = P ( X = − δ ) = 1 / , then S t is a stochastic model for thedynamics of a limit order book discussed in [10]. Remark 2.4

When (cid:126)N t is a multivariate Hawkes process, then (cid:126)S t is a multivariate general compound Hawkes processes(MGCHP) which proposed in [14]. In this Section, we considered the diffusion limit theorems for the MGCPP. It provides us a link between the orderﬂow (cid:126)N t and the price process (cid:126)S t . The functional central limit theorem and law of large numbers for the MGCPP aregeneralizations for the diffusion limit theorems of the MGCHP in [14]. (LLN for MGCPP). Let (cid:126)S nt = ( S ,nt , S ,nt , S ,nt , · · · , S d,nt ) be a d -dimensional general compoundpoint process deﬁned before, we have (cid:126)S nt n → ˜ a ∗ (cid:126) ¯ λt as n → ∞ almost-surly. Proof 3.1 (Proof of Theorem 3.1)

From the deﬁnition of MGCPP in equation (7), we have S i,nt n = S i, n + N i,nt (cid:88) k =1 a i ( X i,k ) n . Since S i, is a constant, we have lim n →∞ (cid:18) S i,t n (cid:19) = lim n →∞ (cid:18) S i, n (cid:19) + lim n →∞ (cid:80) N i,nt k =1 a i ( X i,k ) n = 0 + lim n →∞ (cid:80) N i,nt k =1 a i ( X i,k ) n . (8) Recall the strong LLN of Markov chain (see, eg,. [17]), we have n n (cid:88) k =1 a i ( X i,k ) → n → + ∞ a ∗ i , a.s., where a ∗ i is deﬁned by a ∗ i = (cid:80) k ∈ X i π ∗ i,k a i ( X i,k ) . Consider the LLN of MPP in assumption 2.0.1, we have N i,nt n → ¯ λ i t as n → ∞ almost-surly, we obtain n N i,nt (cid:88) k =1 a i ( X i,k ) = N i,nt n N i,nt N i,nt (cid:88) k =1 a i ( X i,k ) → n → + ∞ a ∗ i ¯ λ i t, a.s. (9) Rewrite (9) in the multivariate case, we derive the LLN for the MGCPP. PREPRINT - A

UGUST

4, 2020 (FCLT I: Stochastic Centralization). Let X i,k , i = 1 , , · · · , d be independent ergodic Markov chainswith n states { , , · · · , n } and with ergodic probabilities (cid:0) π ∗ i, , π ∗ i, , . . . , π ∗ i,n (cid:1) . Let (cid:126)S nt be d -dimensional generalcompound point process, we have (cid:126)S nt − ˜ a ∗ (cid:126)N nt √ n −→ ˜ σ ∗ Λ / (cid:126)W ( t ) , f or all t > (10) as n → ∞ , where (cid:126)W ( t ) is a standard d -dimensional Brownian motion, Λ is a diagonal matrix such that Λ = diag (¯ λ , ¯ λ , ¯ λ , · · · , ¯ λ d ) , (cid:126)N nt is a d -dimensional vector, ˜ a ∗ and ˜ σ ∗ are diagonal matrices ˜ a ∗ =  a ∗ · · · ... . . . ... · · · a ∗ d  , (cid:126)N nt =  N ,nt ... N d,nt  , ˜ σ ∗ =  σ ∗ · · · ... . . . ... · · · σ ∗ d  . Here, a ∗ i = (cid:80) k ∈ X i π ∗ i,k a i ( X i,k ) , and ( σ ∗ i ) := (cid:80) k ∈ X i π ∗ i,k v i ( k ) with v i ( k ) = b i ( k ) + (cid:88) j ∈ X i ( g i ( j ) − g i ( k )) P i ( k, j ) − b i ( k ) (cid:88) j ∈ X i ( g i ( j ) − g i ( k )) P i ( k, j ) b i = ( b i (1) , b i (2) , . . . , b i ( n )) (cid:48) b i ( k ) : = a i ( k ) − a ∗ i g i : = ( P i + Π ∗ i − I ) − b i , where P i is the transition probability matrix for the Markov chain X i , Π ∗ i is the matrix of stationary distributions of P i ,and g i ( j ) is the j th entry of g i . Proof 3.2 (Proof of Theorem 3.2)

From the deﬁnition of MGCPP, we have S i,nt = S i, + N i,nt (cid:88) k =1 a i ( X i,k ) , (11) and S i,t = S i, + N i,nt (cid:88) k =1 ( a i ( X i,k ) − a ∗ i ) + a ∗ i N i,nt , (12) here the a ∗ i is deﬁned by a ∗ i = (cid:80) k ∈ X i π ∗ i,k a i ( X i,k ) . Then, for some n , we have S i,t − a ∗ i N i,nt √ n = S i, + (cid:80) N i,nt k =1 ( a i ( X i,k ) − a ∗ i ) √ n . (13) Since S i, is a constant, when n → ∞ , we have lim n →∞ (cid:18) S i,t − a ∗ i N i,nt √ n (cid:19) = lim n →∞ (cid:18) S i, √ n (cid:19) + lim n →∞ (cid:32) (cid:80) N i,nt k =1 ( a i ( X i,k ) − a ∗ i ) √ n (cid:33) = 0 + lim n →∞ (cid:32) (cid:80) N i,nt k =1 ( a i ( X i,k ) − a ∗ i ) √ n (cid:33) . (14) Consider the following sums: R ∗ i,n := n (cid:88) k =1 ( a i ( X i,k ) − a ∗ i ) , and U ∗ i,n ( t ) := n − / (cid:104) (1 − ( nt − (cid:98) nt (cid:99) )) R ∗ i, (cid:98) nt (cid:99) + ( nt − (cid:98) nt (cid:99) ) R ∗ i, (cid:98) nt (cid:99) +1 (cid:105) , PREPRINT - A

UGUST

4, 2020 where (cid:98)·(cid:99) is the ﬂoor function. As the similar martingale method in [21] and [25], we have the following weakconvergence in Skorokhod topology U ∗ i,n ( t ) → n → + ∞ σ ∗ i W i ( t ) . (15) From the assumption (2.0.1), we have the LLN for the MPP in the form of N i ( nt ) n → n →∞ ¯ λ i t. Using change of time in (15) and let t → N i ( nt ) /n , we have U ∗ i,n ( N i ( nt ) /n ) → n → + ∞ σ ∗ i (cid:112) ¯ λ i W i ( t ) . (16) Rewrite (16) in the multivariate form we derive the weak convergence for MGCPP: (cid:126)S nt − ˜ a ∗ (cid:126)N nt √ n −→ n →∞ ˜ σ ∗ Λ / (cid:126)W ( t ) , f or all t > . (17)Next, we consider a simple special case. Let X i,k be a Markov chain with two dependent states (+ δ, − δ ) and theergodic probabilities ( π ∗ i , − π ∗ i ) . In the limit order market, the δ is the ﬁxed tick size and the d -dimensional pointprocess (cid:126)N nt represents the order ﬂow for d stocks. Here, we set a i ( x ) = x in the equation 7. In this way, we can derivethe corresponding limit theorems for the d -dimensional price process (cid:126)S nt . Corollary 3.2.1 (FCLT I two-state MGCPP: Stochastic Centralization). (cid:126)S nt − ˜ a ∗ (cid:126)N nt √ n −→ n →∞ ˜ σ ∗ Λ / (cid:126)W ( t ) , f or all t > , (18) where (cid:126)W ( t ) is a standard d -dimensional Brownian motion, Λ is a diagonal matrix such that Λ = diag (¯ λ , ¯ λ , ¯ λ , · · · , ¯ λ d ) , ˜ a ∗ and ˜ σ ∗ are diagonal matrices deﬁned as ˜ a ∗ =  a ∗ · · · ... . . . ... · · · a ∗ d  , (cid:126)N nt =  N ,nt ... N d,nt  , ˜ σ ∗ =  σ ∗ · · · ... . . . ... · · · σ ∗ d  , where a ∗ i = δ (2 π ∗ i − , and σ ∗ i := 4 δ (cid:32) − p (cid:48) i + π ∗ i ( p (cid:48) i − p i )( p i + p (cid:48) i − − π ∗ i (1 − π ∗ i ) (cid:33) (19) ( p i , p (cid:48) i ) are transition probabilities of the Markov chain X i,k . Corollary 3.2.2 (LLN for two-state MGCPP). Let (cid:126)S nt be d -dimensional general compound point process with two-stateMarkov chain X i,k , we have (cid:126)S nt n → ˜ a ∗ (cid:126) ¯ λt, a.s. Here, ˜ a ∗ and (cid:126) ¯ λ are constants deﬁned in corollary 3.2.1. Proof 3.3 (Proof of Corollary 3.2.1 and 3.2.2)

Set Markov chain X i,k with two states (+ δ, − δ ) and a i ( x ) = x intheorem 3.2 and theorem 3.1, we can derive corollary 3.2.1 and 3.2.2 directly. Remark 3.3

From the FCLT I of MGCPP, we can derive an approximation for the mid-price (cid:126)S nt : (cid:126)S nt ∼ ˜ σ ∗ Λ / (cid:126)W ( t ) √ n + ˜ a ∗ (cid:126)N nt , (20) for all t > and some lagre enough n. Since (cid:126)S nt is the price process in high-frequency trading, the time is alwaysmeasured in a very short period (eg, milliseconds). So, even if the window size nt = 10 seconds with t = 0 . , the n will equal to , which is a very large number. In this way, it is reasonable to consider this kind of approximationin the LOB. PREPRINT - A

UGUST

4, 2020

Remark 3.4

When (cid:126)N t is a multivariate Hawkes process, the corresponding FCLTs and LLNs for the (cid:126)S nt wereconsidered in [14]. When we consider an one-dimensional case, if N t is a renewal process, the corresponding limittheorems for the semi-Markovian model S t model were discussed in [21] and [22]. In this Section, we tested the FCLT I of MGCPP model with the LOBSTER data and compared our results with theresult simulated by MGCHP in [14].

The level one LOBSTER data on June 21st, 2012 was considered in this paper. In this data, time is measured inmilliseconds and the tick size is one cent which means the corresponding δ = 0 . . We can ﬁnd the basic datadescription and check the liquidity from Table 1:Table 1: Data Description and stock liquidity of Microsoft, Intel, Apple, Amazon, and Google for June 21st, 2012. Ticker

INTC 404986 17.3071 3218 0.1375MSFT 411409 5.0640 4016 0.1716AAPL 118497 5.0640 64351 2.7500AMZN 57515 2.4579 27558 1.1777GOOG 49482 2.1146 24085 1.0293

Next, we estimate parameters

Σ = diag( σ , σ , σ , · · · , σ d ) and (cid:126) ¯ λ = (¯ λ , ¯ λ , ¯ λ , · · · , ¯ λ d ) via the LLN and FCLTassumptions of (cid:126)N t . From 2.0.1 and 2.0.2, when n is large enough, we can derive the approximations: (cid:126)N ( nt ) nt ∼ (cid:126) ¯ λ, t ∈ [0 , (21)and √ n ( (cid:126)N nt − E ( (cid:126)N nt )) ∼ Σ / (cid:126)W t , t ∈ [0 , . (22)Take the expectation for (21) and variance for (22), we have E( (cid:126)N ( nt )) nt ∼ (cid:126) ¯ λ, t ∈ [0 , (23)and nt (Var( (cid:126)N nt )) ∼ Σ , t ∈ [0 , . (24)In this way, we derived the estimated parameters for in Table 2.Table 2: Estimated parameters of 5 stocks via the LLN and FCLT assumptions Ticker σ σ ¯ λ INTC 1.4380 2.0680 0.1366MSFT 1.1390 1.2973 0.1729AAPL 7.8981 62.38 2.2938AMZN 4.3919 19.2883 1.0374GOOG 4.7747 22.7980 0.8178Next, we estimated parameters for the Markov chain by applying the two-state MGCPP model in corollary 3.2.1. Thetransition matrix P of two dependent state Markov chain X k is denoted as P = (cid:20) p uu − p uu − p dd p dd (cid:21) . PREPRINT - A

UGUST

4, 2020We calculated frequency in our data to estimated the p uu and p dd in P by p uu = q uu q uu + q ud ,p dd = q dd q dd + q du , where q uu , q dd , q ud , and q du are the number of price goes up twice, goes down twice, goes up and then down, goesdown and then up, respectively. And the result is in Table 3:Table 3: Transition matrix and constant parameters for two-state MGCPP. α ∗ and σ ∗ were calculated by equation (19). Ticker p uu p dd σ ∗ a ∗ INTC 0.5373 0.5814 0.0057 -2.5023 × − MSFT 0.5711 0.6044 0.0060 -2.0145 × − AAPL 0.4954 0.4955 0.0050 -2.1529 × − AMZN 0.4511 0.4590 0.0046 -3.6077 × − GOOG 0.4536 0.4886 0.0047 -1.6584 × − In this Section, we compared the simulation results of MGCPP with the multivariate general compound Hawkes process(MGCHP) model to show that the simple generalized model can also reach a good accuracy as the MGCHP who has asophisticated intensity function. In [14], they simulated the MGCHP with two dependent states for Microsoft and Intel’sdata. So here we also conduct simulations for Microsoft and Intel’s data with the two-state MGCPP, which means theMarkov chain has two dependent states (+ δ, − δ ) .We tested the MGCPP model by comparing the standard deviation for the left hand side and right hand side in the FCLT: (cid:126)S nt − ˜ N nt (cid:126)a ∗ √ n −→ n →∞ ˜ σ ∗ Λ / (cid:126)W ( t ) . That is to say, we ﬁrst cut our data into disjoint windows of size nt , speciﬁcally [ int, ( i + 1) nt ] with t = 0 . and bysetting the left bound as our starting time we can calculate: (cid:126)S ∗ i = (cid:126)S ( i +1) nt − (cid:126)S int − ( ˜ N (( i + 1) nt ) − ˜ N ( int )) (cid:126)a ∗ , and the equation for standard deviation is given by std (cid:110) (cid:126)S ∗ (cid:111) ≈ √ n ˜ σ ∗ Λ / (cid:112) (cid:126)t. (25)The Figure 1 gives a standard deviation comparison of MGCPP, MGCHP, and the raw data for 2 stocks in differentwindow sizes from 0.1 second to 12 seconds in steps of 0.1 second. First, we could ﬁnd the MGCPP parameters makethe standard deviation of LHS very similar to the RHS for each stocks when n is large. So, generally speaking, wecan say our MGCPP model ﬁts the data well. Second, the MGCPP curve is very close to the MGCHP curve or wecould say the simulation results via Intel and Microsoft stocks data are nearly same. It shows that even we don’t have asophisticated intensity function as the Hawkes process, we still can reach a relative good result with a simple pointprocess model. This can help us deal with the computing efﬁciency problem when using the MGCHP model. We’llgive more quantitative error analysis later. 8 PREPRINT - A

UGUST

4, 2020

Window Size(Sec) S t anda r d D e v i a t i on -3 INTC thm1

Compound Hawkes StdEmpirical StdCompound point std -3 Window Size(Sec) S t anda r d D e v i a t i on MSFT thm1

Compound Hawkes StdEmpirical StdCompound point std -3 Figure 1: Standard deviation comparisons for 2 stocks by FCLT I for MGCHP and MGCPP

Remark 3.5

Since the number of windows decreases as the window size nt increases, we can ﬁnd that the spreadof data increases when the window size increases in Figure 1. For example, when we consider nt = 0 . second, thenumber of windows is 234,000. However, a 12-second window size yields 1,950 windows which will lead the standarddeviation increases. Intuitively, the Figure 1 shows that the standard deviation of MGCHP and MGCPP are very close and both of them ﬁtthe real standard deviation very well. Next, we analyze MGCHP and MGCPP models quantitatively.We computed the mean square error (MSE) of the real standard deviation and theoretical standard deviations in Table 4.As can be seen from the Table 4, MGCHP model performs better than the MGCPP model with both Intel and Microsoftdata. For Intel stock data, the MSE of MGCHP is better than MGCPP and nearly better than MGCPP modelwith the Microsoft stock data. However, when we compare the order of magnitude of the MSE ( − ) with the realstandard deviation ( − and − ), we still can conclude that MGCPP is good enough for the mid-price modeling task.Table 4: The MSE of the real standard deviation and theoretical standard deviations from MGCHP and MGCPP. Ticker MGCHP MSE MGCPP MSE

INTC . × − . × − MSFT . × − . × − Recall the equation (25), we can ﬁnd the standard deviation and the square root of time step have a linear relationship.So, we can ﬁt the real standard deviation data with the square root curve by using the least-square regression. And then,we can compare the coefﬁcients from the least-square regression and two stochastic models.Table 5: Coefﬁcients calculated by MGCHP and MGCPP models.

Ticker MGCHP Coefﬁcient MGCPP Coefﬁcient Regression Coefﬁcient MGCHP % Error MGCPP % Error

INTC 0.002086 0.002089 0.002162 . . MSFT 0.002494 0.002487 0.002609 . . From the Table 5, we can ﬁnd that the percentage error of both two stochastic models are all smaller than and thereis no signiﬁcant difference between the MGCPP coefﬁcient and the MGCHP coefﬁcient. We will give more simulation examples by using the Google, Apple, and Amazon data with the MGCPP model withn-state dependent orders in this Section. Thanks to [23], we can conclude that the accuracy of the general compoundHawkes process model increases when the number of states increases. And for Google, Apple, and Amazon in9

PREPRINT - A

UGUST

4, 2020LOBSTER data set, the best number of states is to . In the previous Section, we also showed that the simulationresults of MGCPP is nearly same as the MGCHP. So, it’s reasonable to consider a MGCPP model with -state Markovchain here. Window Size(Sec) S t anda r d D e v i a t i on AAPL thm1 MGCPP with 2-state Markov chain (+ , - )

Empirical StdCompound point std

Window Size(Sec) S t anda r d D e v i a t i on AAPL thm1 MGCPP with 7-state Markov chain

Empirical StdCompound point std

Figure 2: Standard deviation comparisons for MGCPP with 2-state Markov chain and 7-state Markov chain simulatedby Apple’s stock data

Window Size(Sec) S t anda r d D e v i a t i on GOOG thm1 MGCPP with 2-state Markov chain (+ , - )

Empirical StdCompound point std

Window Size(Sec) S t anda r d D e v i a t i on GOOG thm1 MGCPP with 7-state Markov chain

Empirical StdCompound point std

Figure 3: Standard deviation comparisons for MGCPP with 2-state Markov chain and 7-state Markov chain simulatedby Google’s stock data

Window Size(Sec) S t anda r d D e v i a t i on AMZN thm1 MGCPP with 2-state Markov chain (+ , - )

Empirical StdCompound point std

Window Size(Sec) S t anda r d D e v i a t i on AMZN thm1 MGCPP with 7-state Markov chain

Empirical StdCompound point std

Figure 4: Standard deviation comparisons for MGCPP with 2-state Markov chain and 7-state Markov chain simulatedby Amazon’s stock data 10

PREPRINT - A

UGUST

4, 2020Figure 2, 3, and 4 give standard deviation comparisons for MGCPP with 2-state Markov chain and 7-state Markov chainsimulated by different tickers’ data. Since the 2-state simulation results here are not as good as the results simulated byIntel’s and Microsoft’s data, we take bigger time steps and window sizes (from 10 seconds to 20 minus with 10 secondstime step) to capture more dynamics. From the ﬁgures we can ﬁnd that the 7-state model has a signiﬁcant improvementthan the 2-state model. 7-state curves for AAPL and GOOG are very close to the real standard deviation, although thetheoretical curve of AMZN is underestimated even with the 7-state model.Table 6: The MSE and coefﬁcients computed by MGCPP with 2-state and 7-state Markov chain for different tickers.The regression coefﬁcients were derived by ﬁtting the real standard deviations with square root curve. And MGCPPcoefﬁcients were computed by equation (25).

Ticker MSE Regression Ceofﬁcient MGCPP Ceofﬁcient Percentage Error

AAPL 2-state 0.2467 0.0278 0.0076 . AAPL 7-state 0.0064 0.0311 0.0288 . GOOG 2-state 0.4161 0.0307 0.0044 . GOOG 7-state 0.0081 0.0307 0.0287 . AMZN 2-state 0.1233 0.0189 0.0048 . AMZN 7-state 0.0225 0.0205 0.0147 . The Table 6 lists the MSE and coefﬁcients of the 2-state and 7-state models with different tickers. We can ﬁnd theimprovement of 7-state model quantitatively from the Table. The results of AAPL and GOOG are good enough for themid-price modeling. As for AMZN, although we derive a remarkable improvement from 2-state model ( . error)to 7-state model ( . error), we cannot make the error smaller than or . This is to say, MGCPP modelmay not be able to capture the full dynamics for AMZN data, but it still can be a strong candidate for modeling themid-price, which is consistent with the conclusion of MGCHP model in [23].In general, we can conclude that: as a generalization of MGCHP, the MGCPP model also has a very good performancein mid-price dynamics modeling. If we consider the MGCPP with higher states Markov chain, we will derive a betterresult. Remark 3.6

The MGCPP is not only a generalization of MGCHP, but also a generalization for all multivariatecompound models whose point processes (cid:126)N t satisfy the assumptions 2.0.1 and 2.0.2. The reason we use Hawkes processfor comparison is we want to take the advantage of numerical examples in references. We proved a LLN and FCLT for the MGCPP in the previous Section. And the limit theorems provide us an approximationfor the mid-price modeling in the LOB. Recall the approximation in Remark 3.3, we have (cid:126)S nt ∼ ˜ σ ∗ Λ / (cid:126)W ( t ) √ n + ˜ a ∗ (cid:126)N nt , (26)where the (cid:126)S nt is the price process and (cid:126)N nt is the order ﬂow. However, in the real-world problems, equation (26) cannothelp us with the forecasting task directly because we couldn’t have the order ﬂow (cid:126)N nt in advance. This motivates us toconsider a FCLT II for the MGCPP in this Section. (FCLT II: Deterministic Centralization). Let X i,k , i = 1 , , · · · , d be independent ergodic Markovchains with n states { , , · · · , n } and with ergodic probabilities (cid:0) π ∗ i, , π ∗ i, , . . . , π ∗ i,n (cid:1) . Let (cid:126)S nt be d -dimensionalcompound point process, we have (cid:126)S nt − ˜ a ∗ E ( (cid:126)N nt ) √ n −→ ˜ σ ∗ Λ / (cid:126)W ( t ) + ˜ a ∗ Σ / (cid:126)W ( t ) , f or all t > (27) as n → ∞ , where (cid:126)W ( t ) and (cid:126)W ( t ) are independent standard d -dimensional Brownian motions. Parameters ˜ σ ∗ , ˜ a ∗ , Λ ,and Σ are deﬁned in Theorem 3.2. PREPRINT - A

UGUST

4, 2020

Proof 4.1 (Proof of Theorem 4.1)

Recall the FCLT for MPP (assumption 2.0.2), we have (cid:18) √ n (cid:126)N nt − √ n E ( (cid:126)N nt ) (cid:19) −→ Σ / (cid:126)W t (28) in law for the Skorokhod topology, as n → ∞ . And from theorem 3.2, we have the FCLT for MGCPP (cid:126)S nt − ˜ a ∗ (cid:126)N nt √ n −→ ˜ σ ∗ Λ / (cid:126)W t , f or all t > (29) as n → ∞ in the weak law of Skorokhod topology. Here, we assume two multivariate Brownian motions in (28) and(29) are mutually independent and we refer them (cid:126)W ( t ) and (cid:126)W ( t ) . Next, consider (cid:126)S nt √ n − ˜ a ∗ E ( (cid:126)N nt ) √ n = (cid:126)S nt − ˜ a ∗ (cid:126)N nt √ n + ˜ a ∗ (cid:18) √ n (cid:126)N nt − √ n E ( (cid:126)N nt ) (cid:19) . (30) With (28) and (29) we can derive (cid:126)S nt − ˜ a ∗ (cid:126)N nt √ n + ˜ a ∗ (cid:18) √ n (cid:126)N nt − √ n E ( (cid:126)N nt ) (cid:19) −→ ˜ σ ∗ Λ / (cid:126)W ( t ) + ˜ a ∗ Σ / (cid:126)W ( t ) (31) as n → ∞ which gives (27). Remark 4.2

We can also consider a special case as the FCLT I. Let X i,k be a Markov chain with two dependentstates (+ δ, − δ ) and the ergodic probabilities are ( π ∗ i , − π ∗ i ) . Set a i ( x ) = x in the deﬁnition 7. Then, we can derive asimilar result for FCLT II. Parameters ˜ a ∗ and ˜ σ ∗ can be computed by equation (19). Remark 4.3

For the FCLT II, we can also consider a similar approximation as the FCLT I. For some large enough n ,we have (cid:126)S nt ∼ √ n ˜ σ ∗ Λ / (cid:126)W ( t ) + √ n ˜ a ∗ Σ / (cid:126)W ( t ) + ˜ a ∗ E ( (cid:126)N nt ) , for all t > . (32) To deal with the E ( (cid:126)N nt ) term, we consider the approximation derived from assumption 2.0.1 in equation (23): E ( (cid:126)N ( nt )) ∼ nt(cid:126) ¯ λ. (33) Rewrite equation (32), we have the new approximation (cid:126)S nt ∼ √ n ˜ σ ∗ Λ / (cid:126)W ( t ) + √ n ˜ a ∗ Σ / (cid:126)W ( t ) + ˜ a ∗ nt(cid:126) ¯ λ. (34) In this Section, we applied the LOBSTER data to test the FCLT II. According to the numerical examples of FCLT I, weconsider the standard deviation of the approximation in Remark 4.3, namely std (cid:110) (cid:126)S ( i +1) nt − (cid:126)S int (cid:111) ≈ (cid:113) (˜ σ ∗ ) Λ n(cid:126)t + ( ˜ a ∗ ) Σ n(cid:126)t. (35)The comparisons of real standard deviation and theoretical standard deviation can be found in Figure 5. Since results ofINTC and MSFT are good enough with the -state Markov chain (+ δ, − δ ) in FCLT I, we also applied -state Markovchain for INTC and MSFT here. As for AAPL, GOOG, and AMZN, we used the MGCPP model with -state Markovchain. Window sizes here start from 1 second and increase to 20 minutes in time steps of 10 seconds. As can be seen inFigure 5, the results for FCLT II are as good as the FCLT I results in Figure 1, 2, 3, and 4. We also computed the MSEand coefﬁcients in Table 7. 12 PREPRINT - A

UGUST

4, 2020

Window Size(Sec) S t anda r d D e v i a t i on AAPL thm2 MGCPP with 7-state Markov chain

Empirical StdCompound point std

Window Size(Sec) S t anda r d D e v i a t i on GOOG thm2 MGCPP with 7-state Markov chain

Empirical StdCompound point std

Window Size(Sec) S t anda r d D e v i a t i on INTC thm2 MGCPP with 2-state Markov chain (+ , - )

Empirical StdCompound point std

Window Size(Sec) S t anda r d D e v i a t i on MSFT thm2 MGCPP with 2-state Markov chain (+ , - )

Empirical StdCompound point std

Window Size(Sec) S t anda r d D e v i a t i on AMZN thm2 MGCPP with 7-state Markov chain

Empirical StdCompound point std

Figure 5: Standard deviation comparisons for 5 stocks by FCLT II for the MGCPP. INTC and MSFT are simulated with -state Markov chain while AAPL, AMZN, and GOOG are using -state Markov chain.Table 7: The MSE and coefﬁcients computed by MGCPP FCLT II. Ticker MSE Regression Ceofﬁcient MGCPP Ceofﬁcient Percentage Error

INTC 2-state 1.5820 × − . MSFT 2-state 6.5788e × − . AAPL 7-state 0.0060 0.0278 0.0288 . GOOG 7-state 0.0081 0.0307 0.0287 . AMZN 7-state 0.0121 0.0189 0.0147 . Overall Percentage Error . We see that the percentage errors of MSFT and AAPL are very small (less than ) and the results of INTC and GOOGare also good (less than ). The percentage error of AMZN is large, but it is still smaller than the error derived from13 PREPRINT - A

UGUST

4, 2020FCLT I in Table 6. In general, the simulation results of FCLT II is as good as the FCLT I and we can apply this FCLT IIto model a mid-price.

In this Section, we tested the forecast ability of the MGCPP model. Since we didn’t assume the multivariate pointprocess (cid:126)N t is stationary or independent, we cannot apply the K -fold cross-validation directly. Here, we used the rolling K -fold cross-validation method which proposed in [6]. We divided the last minutes’ data into 5 disjoint -minwindows for each stock. For the fold , We take the ﬁrst minutes’ data as the training set to estimate parameters.And then, we applied the data in the next -min window to calculate the percentage error. Next, we merge the testset into the training set in fold as the new training set in fold and apply the next -min window as a new test set.Repeat this procedure 5 times, we will get 5 percentage errors. The mean value of the 5 percentage errors will be thetest error E for this stock. So, the overall test error for our multivariate model is the average of all test errors. Figure 6gives an example diagram for the rolling cross-validation.Figure 6: Diagram for the Rolling cross-validation.Table 8: Test Errors for different tickers by applying -fold cross-validation. The errors are percentage errors betweenregression coefﬁcients and the MGCPP coefﬁcients.Ticker fold 1 fold 2 fold 3 fold 4 fold 5 Mean ErrorINTC .

75% 0 .

39% 3 .

16% 14 .

32% 16 .

60% 8 . MSFT .

33% 31 .

35% 16 .

96% 8 .

33% 22 .

61% 19 . AAPL .

22% 0 .

51% 22 .

53% 21 .

34% 23 .

33% 15 . GOOG .

60% 20 .

41% 16 .

41% 6 .

13% 12 .

51% 15 . AMZN .

78% 4 .

87% 7 .

98% 18 .

81% 42 .

15% 18 . Overall Test Error E test = 15 . Table 8 lists test errors for different tickers and the overall test error for the MGCPP model. As can be seen from theTable, the test error for each stock is relatively large and the overall test error (15 . is nearly double the overallpercentage error (7 . in Table 7. That’s because the results in Table 7 is a ﬁtting error while the test errors in Table8 is a kind of forecast error. We didn’t apply any future information when we conduct the forecast task. So, even the . overall test error is not as good as the ﬁtting one, it is still a good prediction in the LOB and can provide lots ofinsights in the forecast task. 14 PREPRINT - A

UGUST

4, 2020

In this paper, we proposed a multivariate general compound point process for the mid-price modeling in limit order book.This kind of process is a generalization of several stochastic models in the limit order market. We applied LOBSTERdata to conduct simulations and found the multivariate generalized model is as good as the general compound Hawkesprocess model. We also tested the prediction ability of this kind of process. In general, the MGCPP performs very goodin LOB modeling and it can be a meaningful reference in the mid-price prediction. In the future, we will explore moreapplications of the MGCPP and consider related option pricing problems under this kind of frame work.

References [1] Bacry, E., Delattre, S., Hoffman, M. and Muzy, J.-F. Some limit theorems for Hawkes processes and applicationto ﬁnancial statistics.

Stochastic Processes and their Applications. , v. 123, No. 7, pp. 2475-2499.[2] Bowsher, C. Modelling security market events in continuous time: intensity based, multivariate point processmodels.

J. Econometrica. , 141 (2), pp. 876-912.[3] Brémaud, P., and Massoulié, L. Stability of nonlinear Hawkes processes.

The Annals of Probability. ,1563-1588.[4] Bjork, T.

Introduction to Point Processes from a Martingale Point of View ; KTH, 2011.[5] Bauwens, L. and Hautsch, N.

Modelling Financial High Frequency Data Using Point Processes . Springer, 2009.[6] Bergmeir, C., Hyndman, R. J., and Koo, B. A note on the validity of cross-validation for evaluating autoregressivetime series prediction.

Computational Statistics and Data Analysis. , 120, 70-83.[7] Cartea, Á., Jaimungal, S. and Penalva, J.

Algorithmic and High-Frequency Trading . Cambridge University Press,2015.[8] Chávez-Casillas, J. A., Elliott, R. J., Rémillard, B., and Swishchuk, A. V. A level-1 limit order book with timedependent arrival rates.

Methodology and Computing in Applied Probability. arXiv:1707.04928v2.

June 18, 2019.[10] Cont, R. and de Larrard, A. A Markovian modelling of limit order books.

SIAM J. Finan. Math.

An introduction to the theory of point processes: volume II: general theory andstructure . Springer Science and Business Media, 2007.[12] Eichler, M., Dahlhaus, R., and Dueck, J. Graphical modeling for multivariate hawkes processes with nonparamet-ric link functions.

Journal of Time Series Analysis. , 38(2), 225-242.[13] Embrechts, P., Liniger, T. and Lin, L. Multivariate Hawkes processes: an application to ﬁnancial data.

J. Appl.Prob. , 48, A, pp. 367-378.[14] Guo, Q., and Swishchuk, A. Multivariate general compound Hawkes processes and their applications in limitorder books.

Wilmott magazine. , 107, 42-51.[15] Lemonnier, R., Scaman, K. and Kalogeratos, A. Multivariate Hawkes Processes for Large-Scale Inference. InProceedings of Thirty-First AAAI Conference on Artiﬁcial Intelligence, bf 2017.[16] Liniger, T.

Multivariate Hawkes Processes . PhD thesis, Swiss Fed. Inst. Tech., Zurich, 2009.[17] Norris, J. R.

Markov Chains . Cambridge University Press, 1998.[18] Rambaldi, M., Bacry, M. and Lillo, F. The role of volume in order book dynamics: a multivariate Hawkes processanalysis.

Quant. Finance. , v17, 2017, issue 7.[19] Skorokhod, A.

Studies in the theory of random processes (Vol. 7021) , Courier Dover Publications, 1982.[20] Swishchuk, A. Risk model based on compound Hawkes process. Abstract, IME 2017, Vienna.[21] Swishchuk, A. and Vadori, N. A semi-Markovian modelling of limit order markets.

SIAM J. Finan. Math. ,v.8, pp. 240-273.[22] Swishchuk, A., Cera, K., Hofmeister, T. and Schmidt, J. General semi-Markov model for limit order books.

Intern. J. Theoret. Applied Finance. , v. 20, 1750019.[23] Swishchuk, A., and Huffman, A. General compound Hawkes processes in limit order books.

Risks. , 8(1),28. 15

PREPRINT - A

UGUST

4, 2020[24] Vinkovskaya, E.

A point process model for the dynamics of LOB . PhD thesis, Columbia University, 2014.[25] Vadori, N., and Swishchuk, A. Strong law of large numbers and central limit theorems for functionals ofinhomogeneous Semi-Markov processes.

Stochastic Analysis and Applications. , 33(2), 213-243.[26] Yang Y., Etesami J., He N. and Kiyavash N. Online Learning for Multivariate Hawkes Processes. In Proceedingsof 31st Conference on Neural Information Processing Systems, Long Beach, CA, USA, 2017.[27] Zheng, B., Roueff, F. and Abergel, F. Ergodicity and scaling limit of a constrained multivariate Hawkes process.

SIAM J. Finan. Math. , 5.[28] Zhu, L. Central limit theorem for nonlinear Hawkes processes,

J. Appl. Prob.2013