[PDF] A reduced-form model for level-1 limit order books

Abstract

One popular approach to model the limit order books dynamics of the best bid and ask at level-1 is to use the reduced-form diffusion approximations. It is well known that the biggest contributing factor to the price movement is the imbalance of the best bid and ask. We investigate the data of the level-1 limit order books of a basket of stocks and study the numerical evidence of drift, correlation, volatility and their dependence on the imbalance. Based on the numerical discoveries, we develop a nonparametric discrete model for the dynamics of the best bid and ask, which can be approximated by a reduced-form model with analytical tractability that can fit the empirical data of correlation, volatilities and probability of price movement simultaneously.

Full PDF

AA REDUCED-FORM MODEL FOR LEVEL-1 LIMIT ORDERBOOKS

TZU-WEI YANG AND LINGJIONG ZHU

Abstract.

One popular approach to model the limit order books dynamicsof the best bid and ask at level-1 is to use the reduced-form diﬀusion approx-imations. It is well known that the biggest contributing factor to the pricemovement is the imbalance of the best bid and ask. We investigate the dataof the level-1 limit order books of a basket of stocks and study the numericalevidence of drift, correlation, volatility and their dependence on the imbal-ance. Based on the numerical discoveries, we develop a nonparametric discretemodel for the dynamics of the best bid and ask, which can be approximatedby a reduced-form model that incorporates the empirical data of correlationand volatilities with analytical tractability that can ﬁt the empirical data ofthe probability of price movement. Introduction

The traditional human traders have largely been replaced by the automatic andelectronic traders in today’s ﬁnancial world. The role and the controversy of thosehigh frequency traders have caught public’s attention ever since the infamous ﬂashcrash on May 6, 2010 and the long-standing debate over the fairness of equitymarkets brieﬂy became salient with the recent publication of the book “Flash Boys”by Michael Lewis, who argued that the trading has become unfair and skewed bythe high-frequency trading and dampened the opportunities of the regular investors.In automatic and electronic order-driven trading platforms, orders arrive at theexchange and wait in the limit order book. There are two types of orders in thelimit order book: market orders and limit orders. Cancellation is also allowed. Oneof the key research areas in limit order books has been centered around modelingthe limit order book dynamics. In this paper, we only consider the limit order bookmodel at level-1, that is, we only study the dynamics of the volumes at the bestbid and the best ask.The limit order book is a discrete queuing system and many of the works in theliterature model study the dynamics of the limit order book in a discrete settingdirectly, see e.g. Cont et al. [5], Abergel and Jedidi [1]. Another popular approach,is to study the reduced-form of the discrete model. In the sense of heavy traﬃclimits, various authors, see e.g. Cont and de Larrard [3, 4], Avellaneda et al. [2],Guo et al. [8] considered the diﬀusion limit as an approximation of the discretemodel. The diﬀusion approximation is valid if the average queue sizes are muchlarger than the typical quantity of shares traded and the frequency of orders perunit time is high, see e.g. the discussions in Avellaneda et al. [2]. With the

Mathematics Subject Classiﬁcation.

Key words and phrases.

Limit order books, data analysis, reduced form models, diﬀusionapproximations. a r X i v : . [ q -f i n . T R ] N ov TZU-WEI YANG AND LINGJIONG ZHU empirical ﬁnding of approximate scale invariance, Bouchaud et al. [7] derive atwo dimensional Fokker-Planck equation describing the statistical behavior of thequeue dynamic. In the recent work by Huang et al. [10], they introduced a modelwhich accommodates the empirical properties of the full order book and the stylizedfacts of lower frequency ﬁnancial data. In their model, the order ﬂows have state-dependent intensities. We refer to the recent book [12] for more details.One research area of great interests is the dynamics of the limit order books andhow it inﬂuences the stock price movement, see e.g. Avellaneda et al. [2], Cont etal. [5], Huang and Kercheval [9] etc. There is strong empirical evidence to suggestthe biggest factor that drives the movement for the stock price to the next level isthe imbalance of the best bid and the best ask, see e.g. Avellaneda et al. [2], whichis deﬁned as the ratio of the volume at the best bid and total volume at the bestbid and ask:(1.1) Imbalance = Volume at Best BidVolume at Best Bid + Volume at Best Ask . In the limit order book, the stock price will move up when the best ask queueis depleted and the price will move down when the best bid queue is depleted.The empirical data suggests that the probability that the stock price will move upincreases as the imbalance increases. One can think the probability of price movingup as a monotonically increasing function of the current imbalance. One mayexpect that the probability of the stock price moving up is a monotonic functionfrom 0 to 1 as the imbalance increases from 0 to 1. But empirical evidence suggestsotherwise. In Avellaneda et al. [2], they discovered that even though the probabilityof the stock price moving up is indeed an increasing function of the imbalance, itincreases from a positive value to a value less than one. One explanation is thehidden liquidity, that is, the sizes that are not shown in the limit order book, see[2]. As it is hypothesized in [2], there can be two explanations for hidden liquidity.First, markets are fragmented and it can happen that once the best ask on anexchange is depleted, the price will not necessarily go up since an ask order at thatprice may still be available on another market and a new bid cannot arrive untilthat price is cleared on all markets. Second, there exist so-called iceberg orders,the trading algorithms that split large orders into smaller ones that reﬁll the bestquotes as soon as they are depleted.Indeed, we also discover the hidden liquidity in our numerical analysis and weuse the idea of the hidden liquidity to better ﬁt the model. The numerical evidencesuggests that empirical probability of price moving up depends linearly on theimbalance, with hidden liquidity at the very small and large imbalance levels, seeFigures 10, 11, 12, 13. In [2], correlated Brownian motions are used as a reduced-form model to describe the dynamics at the best bid and ask queues. The lineardependence of the empirical probability of price moving up on the imbalance levelsuggests that in the correlated Brownian motions model, the correlation should beexactly −

1. However, we carried out numerical analysis to study the correlation andvolatilities of the best bid and ask sizes and their dependence on the imbalance,and found out that the correlation is negative, but far away from − REDUCED-FORM MODEL FOR LEVEL-1 LIMIT ORDER BOOKS 3 data of the empirical correlation, empirical volatilities of the best bid and ask sizesand the empirical probability of price moving up simultaneously.In this paper, we will use data to study the level-1 limit order books for abasket of stocks, to further understand the dependence on the imbalance of thebest bid and ask sizes. We will look at a basket of stocks and also compare theresults amongst diﬀerent exchanges, in particular, NASDAQ and NYSE becauseour empirical data suggest that the stocks we selected have the largest tradingvolumes in these two stock exchanges. We discover that the micro structure andthe dynamics of the limit order books depend on their exchanges, in the sense, thatthe key statistics like correlation between the best bid and ask, and the drift eﬀectat the best bid and ask queues can diﬀer across the exchanges. The discrepancyamongst the exchanges has caught a lot of attention lately. As noted in a recentarticle on Wall Street Journal [13]:“There is no question that U.S. equity markets are fragmented.The New York Stock Exchange’s share of trading in its listed stockshas dropped to 32% of its volume from 77% a decade ago... Thisfragmentation... also creates arbitrage opportunities that did notexist when trading markets were uniﬁed.”In our empirical study, we discover the evidence of discrepancy amongst diﬀerentexchanges and also across diﬀerent stocks. This has two possible implications.First, the discrepancy can possibly be explained by the diﬀerent trading patternsof diﬀerent algorithmic traders. Say we have two high frequency traders A and B,who are trading two diﬀerent baskets of stocks. Then the diﬀerent behavior of thedynamics of the limit order books of the stocks may be due to the diﬀerent tradingstrategies and patterns of these two diﬀerent players. As we will see later from ourempirical studies, diﬀerent stocks are concentrated in diﬀerent exchanges. Hencethe diﬀerent trading strategies of the traders behind diﬀerent stocks can result inthe discrepancy across the diﬀerent exchanges. Second, the fragmentation of thestock exchanges may intrinsically cause the diﬀerence of the dynamics of the limitorder books on diﬀerent exchanges, especially when the imbalance of the best bidand ask is either small or large, that is when the queues at the best bid and ask arenear depletion. In these cases, new orders may be directed to a diﬀerent exchangewhen liquidity is still available. Thus the fragmentation of the stock exchanges andthe discrepancy of the limit order books dynamics across diﬀerent exchanges maycreate arbitrage opportunities. As the same article on WSJ pointed out,“Transparency disappears behind a shroud of complex order typesexecuted on vaguely sinister dark pools, trading venues that some-times are used to disadvantage long-term investors... The remedyis to create multiple trading venues and then limit trading in aparticular security to one of them.”The paper is organized as follows. In Section 2, we carry out the data analysis tostudy the empirical evidence of the dependence of the dynamics at the best bid andask on the imbalance. Based on the empirical evidence, we build a nonparametricreduced-form model in Section 3 with analytical tractability, hence extending theexisting reduced-form and diﬀusion approximation approach of the level-1 limitorder book dynamics in the literature. The conclusion of the paper is in Section 4.Finally, the technical proofs are in Appendix A and the tables are in Appendix B.

TZU-WEI YANG AND LINGJIONG ZHU

Table 1.

An Example of the Raw Data. Source: Wharton Re-search Data Services (WRDS).Ticker Date Time Bid Ask Bid Size Ask Size ExchangeGM 20140102 10:12:44 40.55 40.57 9 33 NGM 20140102 10:12:44 40.55 40.57 7 33 NGM 20140102 10:12:44 40.56 40.57 4 4 TGM 20140102 10:12:44 40.56 40.57 4 6 TGM 20140102 10:12:44 40.56 40.57 1 2 PGM 20140102 10:12:44 40.56 40.57 3 6 T2.

Data Analysis

For our data analysis, we use the consolidated quotes of the NYSE-TAQ data setfrom the Wharton Research Data Services (WRDS). We look at the level-1 data,that is, the best bid price, best ask price, best bid size and best ask size for abasket of stocks traded on diﬀerent exchanges. The time window of the data setwe selected is the ﬁrst ﬁve trading days of 2014, that is, January 2nd, 3rd, 6th,7th and 8th. We only consider the trades happened between 10am and 4pm of thetrading days. We exclude the pre-market and after-market data as well as the datafrom the ﬁrst half an hour of each trading day, i.e., 9:30-10am since these data areusually quite noisy. We concentrate on the studies of the following stocks: Bank ofAmerica (BAC), General Electric (GE), General Motors (GM) and JP Morgan &Chase (JPM). These blue-chip stocks have large market capitalization, are highlyliquid and have large trading volumes. Moreover, the average prices of these stocksare reasonably small and the bid-ask spreads are narrow (one tick size most oftime), so that the data sets are not too noisy. A sample table of the raw data weare using is given in Table 1 as an illustration. In Table 1, the units of best bidand ask prices are US Dollars and the units of the best bid and ask sizes are 100.Out empirical observation from the raw data, shows that for most of time, thereis only one size of the bid and ask queues changes, while the other size remains thesame. Such a pattern would not be an appropriate approximation of a diﬀusionprocess that we consider in Section 3 and the empirical correlation between the bidand ask queue sizes is almost zero. We therefore average the consecutive data whenthere is only one queue size varying while the other one remains the same. In thisway, the data would have a better approximation to a diﬀusion process.We found out that the largest volumes for these stocks we studied are traded onNASDAQ (T) and NYSE (N) , see Figure 1. The symbols in Figure 1 stand forthe exchanges that the stocks are traded at and the details are given in Table 2.First, we investigate the drift eﬀect of the best bid and ask queue lengths. Peoplehave used both driftless diﬀusions, see e.g. [2] and diﬀusions with constant drift,see e.g. [4, 8] to model the dynamics of the best bid and ask queues. Therefore,we are interested in seeing from the data set if there is indeed any evidence of That is not always the case. For example, for the ﬁrst ﬁve trading days of 2014, the exchangesthat trade the largest volumes of Walmart (WMT) are NYSE (N) and BATS (Z) and the exchangesthat trade the largest volumes of Microsoft (MSFT) are NASDAQ (T) and BATS (Z). For thecomparison purposes, we only study those stocks with top two exchanges being NASDAQ andNYSE.

REDUCED-FORM MODEL FOR LEVEL-1 LIMIT ORDER BOOKS 5

Table 2.

Primary Listed Exchange CodesCode Exchange Code ExchangeB NASDAQ OMX BX P NYSE Arca SMC National Stock Exchange T NASDAQ OMXJ Direct Edge A Stock Exchange W CBOE Stock ExchangeK Direct Edge X Stock Exchange X NASDAQ OMX PSXM Chicago Stock Exchange Y BATS Y-ExchangeN New York Stock Exchange Z BATS Exchange

B:10.1%C:3.54%J:8.25%K:6.41%M:0.0151% N:26.1% P:6.57% T:18.1%W:0.293%X:4.52%Y:7.61%Z:8.51%

Percentages of Number of Orders ofBank of America in Different Exchanges

B:8.6%C:2.24%J:8.2%K:5.23%M:0.00124% N:32.1% P:5.49% T:20.2%W:0.782%X:3.05%Y:7.01%Z:7.04%

Percentages of Number of Orders ofGeneral Electric in Different Exchanges

B:7.46%C:2.82%J:6.89%K:3.14%M:0.00121%N:16.6% P:9.05% T:28.9% W:2.33%X:1.68%Y:8.4%Z:12.8%

Percentages of Number of Orders ofGeneral Motors in Different Exchanges

B:4.54%C:2.34%J:8.52%K:2.05%M:0.00222%N:24.7% P:9.49% T:27.9%W:1.05%X:1.41%Y:4.46%Z:13.6%

Percentages of Number of Orders ofJPMorgan Chase & Co. in Different Exchanges

Figure 1.

Pie Charts of Bank of America, General Electric, Gen-eral Motors, and JP Morgan & Chasethe drift in the dynamics of the best and bid queues. If there is any evidence ofdrift, is the drift a constant, or a function depending on the queue lengths and theimbalance of the best bid and ask? Our studies are summarized in Figure 2 forBank of America, Figure 3 for General Electric, Figure 4 for General Motors andFigure 5 for JP Morgan & Chase. For example, in Figure 2, we study the total

TZU-WEI YANG AND LINGJIONG ZHU volumes for the positive changes and the negative changes at the best bid queues,best ask queues for both NASDAQ and NYSE for the Bank of America stock. Inall these plots, the purple bars stand for the total volume of the negative changesat a particular imbalance level and the yellow bars stand for the total volume ofthe positive changes at a particular imbalance level. The red lines denote the ratioof the total volume of the positive changes to the total volume of all the changes.We can see that when the imbalance is neither too small or too large, there islittle evidence of the drift in the best bid and ask queues.On the other hand, when the imbalance is very small or very large, we do observedrifts. From the top left picture in Figure 2, it is clear that there is evidence ofnegative drift at the best bid queue when the imbalance is small (and hence thequeue length is short) and little evidence of drift otherwise. From the top rightpicture in Figure 2, it is clear that there is evidence of negative drift at the bestask queue when the imbalance is small (and hence the queue length is short) andlittle evidence of drift otherwise. On the other hand, from the bottom two picturesin Figure 2, we observe that exactly the opposite is true for the Bank of Americastock traded on NYSE, that is, there is positive drift at the best bid queue when theimbalance is small and at the best ask queue when the imbalance is large, althoughthe drift eﬀect is weak. One possible explanation is that there can be diﬀerentscenarios when the queue lengths are short. For example, it can happen that thequeue length is short when the traded stock is about to move to the next price level.There is the clustering eﬀect of market orders and cancellations of the limit ordersthat can explain the negative drift we observed in the top two pictures in Figure2. On the other hand, it is also possible that the queue length is short because itis a new queue and there is clustering eﬀect of the arrivals of new limit orders atthe new queue, which results in the positive drift we observed in the bottom twopictures in Figure 2. Similar patterns are also observed for the General Electricstock, see Figure 3. On the other hand, we see in Figure 4 that for the GeneralMotors stock, for both NASDAQ and NYSE, there is positive drift at the best bidqueue when the imbalance is small and at the best ask queue when the imbalanceis large and there is little drift otherwise. Similar patterns also hold for the JPMorgan & Chase stock, see Figure 5.The statistics are summarized in Table 3 and Table 4. BAC b and BAC a standfor the best bid and the best ask queues for BAC respectively. The number in eachcell is obtained by computingVolume of Positive ChangeVolume of Positive Change + Volume of Negative ChangeFrom Table 3 and Table 4, we can see that except when the imbalance is small orlarge, there is little evidence of the drift in the best bid and ask queues. In terms ofmodeling, this suggests that we can build up a model with no drift eﬀect when theimbalance is neither too small or too large. With the small and large imbalance,instead of modeling the drift eﬀect by adding a drift term in the dynamics, we usethe idea of the hidden liquidity from [2] to better ﬁt the model and explain thedynamics at the best bid and ask queues.Next, we investigate the correlation between the best bid and ask dynamics, thevolatilities at the best bid and ask queues, and their dependence on the imbalance.We summarized our observations for the Bank of America stock in Figure 6, theGeneral Electric stock in Figure 7, the General Motors stock in Figure 8, and the

REDUCED-FORM MODEL FOR LEVEL-1 LIMIT ORDER BOOKS 7

Imbalance T o t a l V o l u m e s o f P o s i t i v e / N ega t i v e C hange s × R a t i o o f P o s i t i v e V o l u m e t o T o t a l V o l u m e Best Bid QueueBank of America at NASDAQ

Negative ChangesPositive Changes

Imbalance T o t a l V o l u m e s o f P o s i t i v e / N ega t i v e C hange s × R a t i o o f P o s i t i v e V o l u m e t o T o t a l V o l u m e Best Ask QueueBank of America at NASDAQ

Negative ChangesPositive Changes

Imbalance T o t a l V o l u m e s o f P o s i t i v e / N ega t i v e C hange s × R a t i o o f P o s i t i v e V o l u m e t o T o t a l V o l u m e Best Bid QueueBank of America at NYSE

Negative ChangesPositive Changes

Imbalance T o t a l V o l u m e s o f P o s i t i v e / N ega t i v e C hange s × R a t i o o f P o s i t i v e V o l u m e t o T o t a l V o l u m e Best Ask QueueBank of America at NYSE

Negative ChangesPositive Changes

Figure 2.

Positive and Negative Changes of the Volumes at theBest Bid and the Best Ask of Bank of America at NASDAQ andNYSE. The curve is the ratio of the positive changes to the totalchanges.JP Morgan & Chase stock in Figure 9. For example, let us take a look at thesummary for the Bank of America stock in Figure 6. The top row in Figure 6stands for the correlation between the size changes at the best bid and the best ask(top left), the standard deviation of size changes at the best bid (top middle), andthe standard deviation of size changes at the best ask (top right) for the Bank ofAmerica stock traded in NASDAQ. Similar statistics for the Bank of America stocktraded in NYSE are summarized in the bottom row in Figure 6. As we can see fromthe top left picture, the correlation as a function of the imbalance, is a W -shapedcurve for the Bank of America stock traded in NASDAQ and from the bottom leftpicture a U -shaped curve for the Bank of America stock traded in NYSE. Similarpattern is observed also for the General Electric stock traded in NASDAQ andNYSE, see Figure 7. The U -shaped curve is observed for General Motors and JPMorgan & Chase traded in both NASDAQ and NYSE, see Figure 8 and Figure 9.Indeed, we studied some other stocks as well in the WRDS database and empiricalstudies suggest that U -shape curves and W -shaped curves are universal for thecorrelation between the size changes at the best bid and ask for most stocks. Italso holds that the correlation in general is negative but is far away from −

1. Itis curious why a typical relation of the correlation between the size changes at thebest bid and ask and the imbalance of the best bid and ask can be represented

TZU-WEI YANG AND LINGJIONG ZHU

Imbalance T o t a l V o l u m e s o f P o s i t i v e / N ega t i v e C hange s × R a t i o o f P o s i t i v e V o l u m e t o T o t a l V o l u m e Best Bid QueueGeneral Electric at NASDAQ

Negative ChangesPositive Changes

Imbalance T o t a l V o l u m e s o f P o s i t i v e / N ega t i v e C hange s × R a t i o o f P o s i t i v e V o l u m e t o T o t a l V o l u m e Best Ask QueueGeneral Electric at NASDAQ

Negative ChangesPositive Changes

Imbalance T o t a l V o l u m e s o f P o s i t i v e / N ega t i v e C hange s × R a t i o o f P o s i t i v e V o l u m e t o T o t a l V o l u m e Best Bid QueueGeneral Electric at NYSE

Negative ChangesPositive Changes

Imbalance T o t a l V o l u m e s o f P o s i t i v e / N ega t i v e C hange s × R a t i o o f P o s i t i v e V o l u m e t o T o t a l V o l u m e Best Ask QueueGeneral Electric at NYSE

Negative ChangesPositive Changes

Figure 3.

Positive and Negative Changes of the Volumes at theBest Bid and the Best Ask of General Electric at NASDAQ andNYSE. The curve is the ratio of the positive changes to the totalchanges.by either a U -shaped curve or a W -shaped curve. It is also worth noting thatsometimes we get diﬀerent shaped curves for diﬀerent exchanges (Figure 6, Figure7) and sometimes we get the same shaped curves for diﬀerent exchanges (Figure 8,Figure 9). That can probably be explained by the fact that some high frequencyand algorithmic trading ﬁrms apply their trading strategies to a particular stockexchange only and the diﬀerent trading strategies result in the diﬀerent patterns ofthe best bid and ask dynamics we observed from the data. Figures 6, 7, 8, 9 alsocontain the information about the standard deviations of the size changes at thebest bid and best ask queues on NASDAQ and NYSE. The general observation isthat most of the time, the standard deviation increases as the imbalance increasesat the best bid queues and decreases as the imbalance increases at the best askqueues. Note that best bid size increases as imbalance increases and best ask sizedecreases as imbalance increases. Hence, what we observed is that the standarddeviations increases as the queue lengths increases. This is not surprising at all.But what’s interesting is that in many cases, it is not exactly monotone and wesee a sudden increase of the standard deviation when the imbalance is small forthe best bid and large for the best ask, that is, when the queue length is short.That suggests that when the queue length is short, that is when the queue is aboutto get deleted, or when there is a new queue created, the volatilities tend to belarge. In general, the volatilities of the empirical data tend to be noisier than the REDUCED-FORM MODEL FOR LEVEL-1 LIMIT ORDER BOOKS 9

Imbalance T o t a l V o l u m e s o f P o s i t i v e / N ega t i v e C hange s × R a t i o o f P o s i t i v e V o l u m e t o T o t a l V o l u m e Best Bid QueueGeneral Motors at NASDAQ

Negative ChangesPositive Changes

Imbalance T o t a l V o l u m e s o f P o s i t i v e / N ega t i v e C hange s × R a t i o o f P o s i t i v e V o l u m e t o T o t a l V o l u m e Best Ask QueueGeneral Motors at NASDAQ

Negative ChangesPositive Changes

Imbalance T o t a l V o l u m e s o f P o s i t i v e / N ega t i v e C hange s × R a t i o o f P o s i t i v e V o l u m e t o T o t a l V o l u m e Best Bid QueueGeneral Motors at NYSE

Negative ChangesPositive Changes

Imbalance T o t a l V o l u m e s o f P o s i t i v e / N ega t i v e C hange s × R a t i o o f P o s i t i v e V o l u m e t o T o t a l V o l u m e Best Ask QueueGeneral Motors at NYSE

Negative ChangesPositive Changes

Figure 4.

Positive and Negative Changes of the Volumes at theBest Bid and the Best Ask of General Motors at NASDAQ andNYSE. The curve is the ratio of the positive changes to the totalchanges.correlations, which is either a U -shaped or a W -shaped curve. Nevertheless, it isquite often to observe the skewed U -shaped curves. For example, in top middleand top right pictures in Figure 6, Figure 7, Figure 8 and Figure 9, we have theskewed U -shaped curves. For the best bid queues, it is skewed towards the leftand for the best ask queues, it is skewed towards the right. It is curious that forthe stocks traded on NASDAQ, we have this universal skewed U -shapes for thevolatilities. But the data for the NYSE tend to be noisier and the pattern is notvery clear. This once again indicates the very diﬀerent natures of the level-1 limitorder dynamics across diﬀerent exchanges.We summarize the statistics of the correlations in Table 5. As we can see,the correlation is almost always negative. In terms of the numbers, the strongestcorrelation is 0 .

02 achieved by the Bank of America stock traded on NYSE withimbalance between 0 .

05 and 0 .

10. The most negative correlation is achieved by JPMorgan traded on NYSE, which is − .

34, that is far away from −

1. One interestingobservation is that when the imbalance is between 0 . .

8, from Table 5, we cansee that the correlation of the stock traded on NYSE is always more negative thanthe correlation of the same stock traded on NASDAQ . As we mentioned earlier,the fragmentation and discrepancy of the stock exchanges is well documented in with the exception of JP Morgan when the imbalance is between 0.55 and 0.60 Imbalance T o t a l V o l u m e s o f P o s i t i v e / N ega t i v e C hange s × R a t i o o f P o s i t i v e V o l u m e t o T o t a l V o l u m e Best Bid QueueJPMorgan Chase & Co. at NASDAQ

Negative ChangesPositive Changes

Imbalance T o t a l V o l u m e s o f P o s i t i v e / N ega t i v e C hange s × R a t i o o f P o s i t i v e V o l u m e t o T o t a l V o l u m e Best Ask QueueJPMorgan Chase & Co. at NASDAQ

Negative ChangesPositive Changes

Imbalance T o t a l V o l u m e s o f P o s i t i v e / N ega t i v e C hange s × R a t i o o f P o s i t i v e V o l u m e t o T o t a l V o l u m e Best Bid QueueJPMorgan Chase & Co. at NYSE

Negative ChangesPositive Changes

Imbalance T o t a l V o l u m e s o f P o s i t i v e / N ega t i v e C hange s × R a t i o o f P o s i t i v e V o l u m e t o T o t a l V o l u m e Best Ask QueueJPMorgan Chase & Co. at NYSE

Negative ChangesPositive Changes

Figure 5.

Positive and Negative Changes of the Volumes at theBest Bid and the Best Ask of JP Morgan & Chase at NASDAQand NYSE. The curve is the ratio of the positive changes to thetotal changes.the literature. For example, we can ask the question why the correlation of stockstraded on NYSE is more negative than that of NASDAQ.3.

A Reduced-Form Model

The simplest continuous time diﬀusion model to describe the dynamics of thelevel-1 limit order book is the correlated Brownian motions, where Q b ( t ) and Q a ( t )are the queue lengths at the best bid and the best ask normalized by the mediansize of the queues, see e.g. Avellaneda et al. [2]: dQ b ( t ) = σdW b ( t ) , Q b (0) = x, (3.1) dQ a ( t ) = σdW a ( t ) , Q a (0) = y, (3.2)where W b ( t ) and W a ( t ) are two correlated standard Brownian motions with corre-lation − ≤ ρ ≤ P up = P ( τ a < τ b ) , P down = P ( τ b < τ a ) , where(3.4) τ a := inf { t > Q a ( t ) ≤ } , τ b := inf { t > Q b ( t ) ≤ } . REDUCED-FORM MODEL FOR LEVEL-1 LIMIT ORDER BOOKS 11

Imbalance C o rr e l a t i on o f B i d and A sk Q ueue s Bank of America at NASDAQ

Imbalance S t anda r d D e v i a t i on o f B i d Q ueue Bank of America at NASDAQ

Imbalance S t anda r d D e v i a t i on o f A sk Q ueue Bank of America at NASDAQ

Imbalance C o rr e l a t i on o f B i d and A sk Q ueue s Bank of America at NYSE

Imbalance S t anda r d D e v i a t i on o f B i d Q ueue Bank of America at NYSE

Imbalance S t anda r d D e v i a t i on o f A sk Q ueue Bank of America at NYSE

Figure 6.

Correlations and Standard Deviations of the Volumesat the Best Bid and the Best Ask of Bank of America at NASDAQand NYSE

Imbalance C o rr e l a t i on o f B i d and A sk Q ueue s General Electric at NASDAQ

Imbalance S t anda r d D e v i a t i on o f B i d Q ueue General Electric at NASDAQ

Imbalance S t anda r d D e v i a t i on o f A sk Q ueue General Electric at NASDAQ

Imbalance C o rr e l a t i on o f B i d and A sk Q ueue s General Electric at NYSE

Imbalance S t anda r d D e v i a t i on o f B i d Q ueue General Electric at NYSE

Imbalance S t anda r d D e v i a t i on o f A sk Q ueue General Electric at NYSE

Figure 7.

Correlations and Standard Deviations of the Volumesat the Best Bid and the Best Ask of General Electric at NASDAQand NYSELet the probability of price moving up be:(3.5) u ( x, y ) := P ( τ a < τ b | Q b (0) = x, Q a (0) = y ) . Imbalance C o rr e l a t i on o f B i d and A sk Q ueue s General Motors at NASDAQ

Imbalance S t anda r d D e v i a t i on o f B i d Q ueue General Motors at NASDAQ

Imbalance S t anda r d D e v i a t i on o f B i d Q ueue General Motors at NYSE

Imbalance C o rr e l a t i on o f B i d and A sk Q ueue s General Motors at NYSE

Imbalance S t anda r d D e v i a t i on o f A sk Q ueue General Motors at NASDAQ

Imbalance S t anda r d D e v i a t i on o f A sk Q ueue General Motors at NYSE

Figure 8.

Correlations and Standard Deviations of the Volumesat the Best Bid and the Best Ask of General Motors at NASDAQand NYSE

Imbalance C o rr e l a t i on o f B i d and A sk Q ueue s JPMorgan Chase & Co. at NASDAQ

Imbalance S t anda r d D e v i a t i on o f B i d Q ueue JPMorgan Chase & Co. at NASDAQ

Imbalance S t anda r d D e v i a t i on o f A sk Q ueue JPMorgan Chase & Co. at NASDAQ

Imbalance C o rr e l a t i on o f B i d and A sk Q ueue s JPMorgan Chase & Co. at NYSE

Imbalance S t anda r d D e v i a t i on o f B i d Q ueue JPMorgan Chase & Co. at NYSE

Imbalance S t anda r d D e v i a t i on o f A sk Q ueue JPMorgan Chase & Co. at NYSE

Figure 9.

Correlations and Standard Deviations of the Volumesat the Best Bid and the Best Ask of JP Morgan & Chase at NAS-DAQ and NYSEIt is known that, see e.g. Avellaneda et al. [2]:(3.6) u ( x, y ) = 12  − arctan (cid:16)(cid:113) ρ − ρ y − xy + x (cid:17) arctan (cid:16)(cid:113) ρ − ρ (cid:17)  . REDUCED-FORM MODEL FOR LEVEL-1 LIMIT ORDER BOOKS 13

Imbalance, Hidden Liquidity = 6.7808e−08 P r obab ili t y o f P r i c e U p Bank of America at NASDAQ

Model PredictionEmpirical Probability

Imbalance, Hidden Liquidity = 0.16745 P r obab ili t y o f P r i c e U p Bank of America at NYSE

Model PredictionEmpirical Probability

Figure 10.

Empirical Probability (Dotted Lines) and Model Pre-diction (Solid Lines) of Bank of America

Imbalance, Hidden Liquidity = 0.066516 P r obab ili t y o f P r i c e U p General Electric at NASDAQ

Model PredictionEmpirical Probability

Imbalance, Hidden Liquidity = 0.23378 P r obab ili t y o f P r i c e U p General Electric at NYSE

Model PredictionEmpirical Probability

Figure 11.

Empirical Probability (Dotted Lines) and Model Pre-diction (Solid Lines) of General Electric

Imbalance, Hidden Liquidity = 0.1443 P r obab ili t y o f P r i c e U p General Motors at NASDAQ

Model PredictionEmpirical Probability

Imbalance, Hidden Liquidity = 0.19202 P r obab ili t y o f P r i c e U p General Motors at NYSE

Model PredictionEmpirical Probability

Figure 12.

Empirical Probability (Dotted Lines) and Model Pre-diction (Solid Lines) of General Motors

Imbalance, Hidden Liquidity = 0.12198 P r obab ili t y o f P r i c e U p JPMorgan Chase & Co. at NASDAQ

Model PredictionEmpirical Probability

Imbalance, Hidden Liquidity = 0.15094 P r obab ili t y o f P r i c e U p JPMorgan Chase & Co. at NYSE

Model PredictionEmpirical Probability

Figure 13.

Empirical Probability (Dotted Lines) and Model Pre-diction (Solid Lines) of JP Morgan Chase & Co.When there is no correlation, i.e., ρ = 0:(3.7) u ( x, y ) = 2 π arctan (cid:18) xy (cid:19) . When the correlation is perfectly negative , i.e., ρ = − u ( x, y ) = xx + y . From (3.6), we can see that the probability of price moving up can be written as afunction depending only on the imbalance:(3.9) P up ( z ) = 12  − arctan (cid:16)(cid:113) ρ − ρ (1 − z ) (cid:17) arctan (cid:16)(cid:113) ρ − ρ (cid:17)  , where z = xx + y . Moreover, P up ( z ) is monotonically increasing in the imbalance z . Remark 1.

More generally, we can assume that the diﬀusion processes are corre-lated Brownian motions with constant drifts: dQ b ( t ) = µ b dt + σ b dW b ( t ) , Q b (0) = x, (3.10) dQ a ( t ) = µ a dt + σ a dW a ( t ) , Q a (0) = y. (3.11) Based on the results in Iyengar [11] and Metzler [14] , we have (3.12) u ( x, y ) = (cid:90) ∞ (cid:90) ∞ e γ a ( r cos α − z a )+ γ b ( r sin α − z b ) − ( γa )2+( γb )22 t g ( t, r ) drdt, where (3.13) (cid:20) γ a γ b (cid:21) = (cid:20) σ a (cid:112) − ρ σ a ρ σ b (cid:21) − (cid:20) µ a µ b (cid:21) , (cid:20) z a z b (cid:21) = (cid:20) σ a (cid:112) − ρ σ a ρ σ b (cid:21) − (cid:20) yx (cid:21) , REDUCED-FORM MODEL FOR LEVEL-1 LIMIT ORDER BOOKS 15 and g ( t, r ) = πα tr exp( − r + r t ) (cid:80) ∞ n =1 n sin( nπ ( α − θ ) /α ) I nπ/α ( rr /t ) , where I ν isthe modiﬁed Bessel function of the ﬁrst kind and α :=  π + arctan( − (cid:112) − ρ /ρ ) , ρ > π , ρ = 0arctan( − (cid:112) − ρ /ρ ) , ρ < r := (cid:113) ( x/σ b ) + ( y/σ a ) − ρ ( x/σ b )( y/σ a ) / (cid:112) − ρ θ :=  π + arctan (cid:18) ( x/σ b ) √ − ρ y/σ a − ρx/σ b (cid:19) , y/σ a < ρx/σ b , π , y/σ a = ρx/σ b , arctan (cid:18) ( x/σ b ) √ − ρ y/σ a − ρx/σ b (cid:19) , y/σ a > ρx/σ b . In particular, when µ a = µ b = 0 , (3.14) u ( x, y ) = θ α . In Avellaneda et al. [2], the authors ﬁtted the empirical probability of mid-price moving up by the correlated Brownian motion model when the correlation is ρ = −

1, that is, (3.8). From Figures 10-13 and Table 4-5 the empirical probabilityof mid-price moving up is indeed linearly dependent on the imbalance. However,as we have already seen in Figures 6, 7, 8, 9 that the correlation is negative, butfar away from −

1, and it also depends on the level of the imbalance. Therefore, aperfect negatively correlated Brownian motions model might not ﬁt both the em-pirical probability and the empirical correlation. We will propose a non-parametricdiﬀusion model that can ﬁt the empirical correlation, empirical volatilities, andempirical probability of price movement simultaneously.The correlated Brownian motion is simple yet still captures the phenomenonthat the price movement is mainly driven by the imbalance at the best bid andask level. We are interested to investigate further the relation of the dynamicsof the volumes at the best bid and ask level and the imbalance. The assumptionin the model (3.6) that the correlation and volatility of the volumes at the bestbid and ask levels are constant might be oversimpliﬁed and not consistent withthe real data. Indeed, the empirical studies we did in Section 2 suggests that thecorrelation of the movements of the volumes at the best bid and ask level is non-trivially dependent on the imbalance. Two universal shapes for the correlation asa function of the imbalance are the U -shaped curve and W -shaped curve. For a U -shaped correlation function of the imbalance, the correlation is negative and itis close to zero when the imbalance is close to 0 and 1 and it is the most negativewhen the imbalance is close to . Similarly we also observe W -shaped correlationcurves. The correlations are consistently negative though far away from −

1. Wealso observe that the volatilities of the volumes at the best bid and ask levels alsodepend non-trivially on the imbalance. The volatility is in general large whenthe imbalance is small or large and the volatility is small when the imbalance ismoderate. The diﬀerence here is that instead of a symmetric U -shaped or W -shaped curve, we often get two skewed U -shaped curves, depending on whether weconsider the best bid or the best ask. Therefore, our goal is to improve the model (3.6) to allow the correlation and volatilities to be non-constant and depend on thelevel of the imbalance.In a very loose analogy, in the literature of the pricing of derivative securities,it is well known that the stock price has the so-called leverage eﬀect, that is, thevolatility of a stock tends to increase when the stock price drops, which is one ofthe key reasons that people have used the CEV models and other local volatilitymodels as an alternative to the Black-Scholes model in which the volatility is alwaysconstant.We are interested to build up a model for the dynamics of the level-1 limit orderbooks, that can capture the empirical evidence that we observed from the data.Let us build a discrete model and ﬁnd its diﬀusion approximation. Let X ( t ) , Y ( t )be the queue lengths at the best bid and the best ask at time t and Z t = X ( t ) X ( t )+ Y ( t ) be the imbalance. Let us assume that • The limit orders that arrive at the best bid is a simple point process N ( t )with intensity λ ( Z t − ) at time t ; • The market orders or cancellations that arrive at the best bid is a simplepoint process N ( t ) with intensity λ ( Z t − ) at time t ; • The limit orders that arrive at the best ask is a simple point process N ( t )with intensity λ ( Z t − ) at time t ; • The market orders or cancellations that arrive at the best ask is a simplepoint process N ( t ) with intensity λ ( Z t − ) at time t ; • There are simultaneous cancellations at the best ask and limit orders atthe best bid that is a simple point process N ( t ) with intensity λ ( Z t − ) attime t ; • There are simultaneous cancellations at the best bid and limit orders atthe best ask that is a simple point process N ( t ) with intensity λ ( Z t − ) attime t ;The last two assumptions above are made due to the observation that the empiricalcorrelation between the best bid and ask queues are always negative. Note that λ is the arrival rate for the idiosyncratic limit orders at the best bid, so the totalarrival rate for the limit orders at the best bid is λ + λ . Similarly, the total arrivalrate for the limit orders at the best ask is λ + λ . For 1 ≤ j ≤

6, we assume that λ j ( z ) = λ j ( xx + y ) : R → R + are continuous and bounded (there is singularity when x + y = 0 and we assume analytic continuation of λ j at the singularity). Finally,for simplicity, we assume that the order size has unit size 1. Note that all thefollowing arguments work if we assume constant order sizes for diﬀerent types oforders. Therefore, the dynamics at the best bid and ask are given by: dX ( t ) = dN ( t ) − dN ( t ) + dN ( t ) − dN ( t ) , (3.15) dY ( t ) = dN ( t ) − dN ( t ) + dN ( t ) − dN ( t ) . Since empirically, we do not observe strong evidence for the drift eﬀect, we assumethe driftless condition:(3.16) λ ( z ) − λ ( z ) = λ ( z ) − λ ( z ) = λ ( z ) − λ ( z ) , so that X ( t ) and Y ( t ) are driftless, in the sense that dX ( t ) = dM ( t ) − dM ( t ) + dM ( t ) − dM ( t ) dY ( t ) = dM ( t ) − dM ( t ) + dM ( t ) − dM ( t ) , REDUCED-FORM MODEL FOR LEVEL-1 LIMIT ORDER BOOKS 17 where for any 1 ≤ j ≤ M j ( t ) := N j ( t ) − (cid:82) t λ j ( Z s − ) ds is a martingale. For thehigh frequency trading, the number of orders is large and the trading frequencyis high, so we can rescale time and space to get a diﬀusion approximation to thediscrete model. Let us deﬁne the rescaled process for 1 ≤ j ≤ X n ( t ) := 1 √ n X ( nt ) , Y n ( t ) := 1 √ n Y ( nt ) , M jn ( t ) := 1 √ n M j ( nt ) . The discrete model (3.15) describes the dynamics of the best bid and ask queuesat the micro level, but may not be easy to work with when we are interested tocompute the probability of mid-price movement. So, next, let us ﬁnd a diﬀusionapproximation to the discrete model (3.15).Let us assume that dQ b ( t ) = σ b ( Z ( t )) dW b ( t ) , Q b (0) = x > dQ a ( t ) = σ a ( Z ( t )) dW a ( t ) , Q a (0) = y > Z ( t ) = Q b ( t ) Q b ( t ) + Q a ( t ) , where W b ( t ) and W a ( t ) are two standard Brownian motions with correlation ρ ( Z ( t ))at time t . We assume that ( x, y ) (cid:55)→ σ b ( xx + y ), ( x, y ) (cid:55)→ σ a ( xx + y ) are bounded andcontinuous from R to R + , and ( x, y ) (cid:55)→ ρ ( xx + y ) is bounded and continuous from R to [ − , σ b , σ a , ρ are Lipschitz, then the solution is guaranteed tobe strong, see e.g. [15].Note that the discrete process ( X n ( t ) , Y n ( t )) and ( Q b ( t ) , Q a ( t )) should both livein the ﬁrst quadrant. But to avoid the well-deﬁnedness after the process hitting theboundary of the ﬁrst quadrant, we make the processes well-deﬁned on R . Sinceour goal is to compute the probability of mid-price movement, which is about theﬁrst hitting time of the boundary of the ﬁrst quadrant, the extension from the ﬁrstquadrant to R will not alter the results, and it is just for the sake of convenience.The discrete model (3.15) can be approximated by the diﬀusion model (3.18) asfollows. Theorem 2.

Given that ( X ( t ) , Y ( t )) is the discrete model of the best bid and askqueues in (3.15), assume that for ≤ j ≤ , λ j ( z ) : R → R + are continuousand bounded functions and the driftless condition (3.16) holds. Also assume that ( X n (0) , Y n (0)) = ( x, y ) ∈ R + × R + . Then the rescaled process ( X n ( t ) , Y n ( t )) in(3.17) converges weakly in D [0 , T ] as n → ∞ to ( Q b ( t ) , Q a ( t )) in (3.18), where D [0 , T ] is the space of c`adl`ag processes equipped with Skorohod topology. In addition,the diﬀusion and correlation coeﬃcients are the explicit functions of the intensities λ j ( z ) : σ b ( z ) = (cid:2) λ ( z ) + λ ( z ) + λ ( z ) + λ ( z ) (cid:3) / σ a ( z ) = (cid:2) λ ( z ) + λ ( z ) + λ ( z ) + λ ( z ) (cid:3) / ρ ( z ) = − λ ( z ) + λ ( z ) σ b ( z ) σ a ( z ) . Note that xx + y can be singular, and σ b , σ a , ρ are deﬁned as the analytic continuation at thesingular points ±∞ . The probability of mid-price movement for the diﬀusion model (3.18) can becomputed in the closed-form as follows.

Theorem 3.

Given the model (3.18), P up ( z ) , the probability of the price movingup, deﬁned in (3.3) and (3.4), is explicitly given by (3.19) P up ( z ) = (cid:82) z e − (cid:82) y µ ( x ) ν ( x ) dx dy (cid:82) e − (cid:82) y µ ( x ) ν ( x ) dx dy , where z is the imbalance and µ ( z ) = − − z ) σ b ( z ) + 2(2 z − ρ ( z ) σ b ( z ) σ a ( z ) + 2 zσ a ( z ) (3.20) ν ( z ) = (1 − z ) σ b ( z ) − z (1 − z ) ρ ( z ) σ b ( z ) σ a ( z ) + z σ a ( z ) . Remark 4.

What we are really interested to compute is the probability of mid-pricemovement for the discrete model, and this can be approximated by the probabilityof mid-price movement for the diﬀusion model which has closed-form formula, thatis given in Theorem 3. For any n > , we have P ( X ( t ) hits zero before Y ( t ) does ) = P (cid:18) √ n X ( nt ) hits zero before √ n Y ( nt ) does (cid:19) (cid:39) P ( Q b ( t ) hits zero before Q a ( t ) does ) , as n → ∞ . Note that the approximation requires that Q b (0) = √ n X (0) and Q a (0) = √ n Y (0) and this is still reasonable since the formula in Theorem 3 only dependson the ratio of Q b (0) and Q a (0) so we can rescale the initial condition. Remark 5.

We can recover the results in [2] from Theorem 3: (1)

When σ b = σ a = σ and ρ = − , we have µ ( z ) ≡ and thus P up ( z ) = z = xx + y which recovers (3.8) . (2) When σ b = σ a = σ and ρ = 0 , we have µ ( z ) = − − z ) σ and ν ( z ) =[(1 − z ) + z ] σ . Therefore, P up ( z ) = (cid:82) z e (cid:82) y − x − x +2 x dx dy (cid:82) e (cid:82) y − x − x +2 x dx dy = (cid:82) z e − log(2( y − y +1) dy (cid:82) e − log(2( y − y +1) dy = (cid:82) z − y +2 y dy (cid:82)

10 11 − y +2 y dy = arctan(1 − z ) − π − π − π = π − arctan(1 − z ) π = π (cid:18) z − z (cid:19) = 2 π arctan (cid:18) xy (cid:19) , which recovers (3.7) . (3) In the special case that σ b ( · ) = σ a ( · ) and ρ ( · ) ≡ ρ , by (3.14) , we have u ( x, y ) = θ /α y < ρx y = ρx y > ρxρ > π − arctan ( λ ρxρx − y ) π − arctan λ π/ π − arctan λ arctan ( λ ρxρx − y ) π − arctan λ ρ = 0 N/A π arctan (cid:16) λ ρxρx − y (cid:17) ρ < N/A π/ − arctan λ arctan ( λ ρxρx − y ) − arctan λ where λ = (cid:112) − ρ /ρ . REDUCED-FORM MODEL FOR LEVEL-1 LIMIT ORDER BOOKS 19

Remark 6.

When ρ ( · ) ≡ − , we have λ ≡ λ ≡ λ ≡ λ ≡ and λ ( · ) = λ ( · ) .And thus σ b ( · ) = σ a ( · ) . By using the assumption 3.16, we can check that theprobability of mid-price moving up u ( x, y ) = xx + y satisﬁes (A.4) and thus this isthe probability of mid-price movement for the reduced-form model. Indeed, for theoriginal unscaled discrete model (3.15) , u ( x, y ) satisﬁes the equation (3.21) λ (cid:18) xx + y (cid:19) [ u ( x + 1 , y − − u ( x, y )] + λ (cid:18) xx + y (cid:19) [ u ( x − , y + 1) − u ( x, y )] = 0 , for ( x, y ) ∈ Z ≥ × Z ≥ with boundary condition u (0 , y ) = 0 and u ( x,

0) = 1 . It iseasy to see that u ( x, y ) = xx + y . Indeed, this result is true for perfectly negativelycorrelated queues model-free. To see this, notice that X ( t ) and Y ( t ) are perfectlynegatively correlated martingales and we can write X ( t ) = x + M ( t ) and Y ( t ) = y − M ( t ) , where M ( t ) is a martingale. Therefore, the probability that X ( t ) hits before Y ( t ) does is the same as the probability that M ( t ) hits − x before y . Byoptional stopping theorem for martingales, this probability can be easily computedas xx + y . As we can see from Figures 10, 11, 12, 13, the empirical probability of mid-pricemoving up (dotted lines) is linearly dependent on the imbalance. Except the BACat NASDAQ and the GE at NASDAQ, there is also a strong numerical evidenceof the hidden liquidity, that is, the probability of moving up is bigger than zerowhen the imbalance is near zero and less than one when the imbalance is near one.To better ﬁt the data, we introduce the hidden liquidity H ∈ (0 , H for imbalance at zero and 1 − H for imbalance at one . That is, P up ( z ) satisﬁes the boundary condition P up (0) = H and P up (1) = 1 − H . Following the proofs of Theorem 3, we get the following result(the proof will be given in Appendix A). Theorem 7.

Given the model (3.18), P up ( z ) , the probability of the price moving up,deﬁned in (3.3) and (3.4) with the boundary conditions P up (0) = H , P up (1) = 1 − H ,is explicitly given by (3.22) P up ( z ) = H + (1 − H ) (cid:82) z e − (cid:82) y µ ( x ) ν ( x ) dx dy (cid:82) e − (cid:82) y µ ( x ) ν ( x ) dx dy , where z = xx + y is the imbalance and µ , ν are the functions in (3.20). To ﬁt the data, we use the empirical data for σ b ( · ), σ a ( · ) and ρ ( · ) to plug into theformula (3.22) and obtain P theoretical ( z, H ), the theoretical probability of mid-pricemoving up at imbalance level z and then use the least square method(3.23) min H (cid:88) z ( P empirical ( z ) − P theoretical ( z, H )) , to ﬁnd the best ﬁtting hidden liquidity H . The solid lines in Figures 10, 11, 12, 13are the model predictions. Remarkably, the solid lines are almost linear in imbalance This is slightly diﬀerent from the deﬁnition of hidden liquidity in Avellaneda et al. [2]. Ourdeﬁnition of H can be interpreted as a probability, a free parameter that helps get P up closer tothe empirical data in the least-square sense rather than a real liquidity as in Avellaneda et al. [2].We choose this deﬁnition to keep the analytical tractability of the model and also our deﬁnitionis simpler in the sense that we only need to bucket the data and discretize our analytical formulaaccording to the imbalance level rather than the best bid and ask queue sizes as in [2] even though the correlation is a complicated function of imbalance and far awayfrom being −

1. Hence, we have built up a model with the empirical correlationand volatilities as the input that can produce an analytical formula for the pricemovement probability that can be used to ﬁt to the empirical probability data4.

Conclusions

We did numerical studies of the drift eﬀect, correlation and volatility of the bestbid and ask queues and how they depend on the imbalance of the volumes at thebest bid and ask queues from the level-1 limit order books data from the WRDS.We discovered that there is little evidence for the drift except when the imbalanceis small or large. The correlation as a function of the imbalance exhibits universalbehaviors, which is either a U -shaped or a W -shaped curve, and it is almost alwaysnegative though far away from −

1. The volatility is much more noisy and ingeneral lacks a clear pattern though very often exhibit skewed U -shapes. All theempirical results are highly stock and also exchange dependent, which suggests thatthe dynamics of the limit order books are very sensitive to their particular stockand also exchanges. Based on our empirical discoveries, we built up a discretemodel for the dynamics of the best bid and ask queues and showed that it canbe approximated by a reduced-form diﬀusion model with functional dependence ofthe drift, correlation and volatility on the imbalance, which therefore generalizesthe correlated Brownian motion model that is commonly used in the limit orderbooks literature. Our reduced-form model still keeps analytical tractability, andit is self-consistent when it is ﬁt to the data of both the empirical probability ofmid-price movement and the empirical correlation/volatility. Acknowledgments

The authors are very grateful to two anonymous referees and the editor whosesuggestions have greatly improved the presentation and the quality of the paper.The authors would also like to thank Jean-Philippe Bouchaud, Yuanda Chen, ArashFahim, Xuefeng Gao and Alec Kercheval for their invaluable comments. LingjiongZhu is partially supported by NSF Grant DMS-1613164.

Appendix A. Proofs

Proof of Theorem 2.

Notice that M j ( t ) are martingales with predictable quadraticvariation (cid:82) t λ j ( Z s − ) ds , where λ j is bounded, i.e. (cid:107) λ j (cid:107) ∞ < ∞ . For any t ∈ [0 , T − δ ], δ >

0, we have(A.1) E (cid:2) (cid:107) ( X n ( t + δ ) , Y n ( t + δ )) − ( X n ( t ) , Y n ( t )) (cid:107) (cid:3) ≤ C (cid:88) j =1 E (cid:2) ( M jn ( t + δ ) − M jn ( t )) (cid:3) , for some constant C >

0. By Burkholder-Davis-Gundy inequality, for any 1 ≤ j ≤ E (cid:2) ( M jn ( t + δ ) − M jn ( t )) (cid:3) ≤ Cn E (cid:32)(cid:90) ( t + δ ) ntn λ j ( Z s ) ds (cid:33)  ≤ C (cid:107) λ j (cid:107) ∞ δ , for some constant C >

0. Therefore, by applying Kolmogorov’s tightness criterion,we can show that ( X n ( t ) , Y n ( t )) is tight. REDUCED-FORM MODEL FOR LEVEL-1 LIMIT ORDER BOOKS 21

The inﬁnitesimal generator for the rescaled process ( X n ( t ) , Y n ( t )) is given by L n f ( x, y ) := nλ ( z ) (cid:20) f (cid:18) x + 1 √ n , y (cid:19) − f ( x, y ) (cid:21) + nλ ( z ) (cid:20) f (cid:18) x − √ n , y (cid:19) − f ( x, y ) (cid:21) + nλ ( z ) (cid:20) f (cid:18) x, y + 1 √ n (cid:19) − f ( x, y ) (cid:21) + nλ ( z ) (cid:20) f (cid:18) x, y − √ n (cid:19) − f ( x, y ) (cid:21) + nλ ( z ) (cid:20) f (cid:18) x + 1 √ n , y − √ n (cid:19) − f ( x, y ) (cid:21) + nλ ( z ) (cid:20) f (cid:18) x − √ n , y + 1 √ n (cid:19) − f ( x, y ) (cid:21) , where z = xx + y and f is a twice continuously diﬀerentiable test function.By using the driftless assumption (3.16), we get L n f ( x, y ) := nλ ( z ) (cid:20) √ n ∂f∂x + 12 n ∂ f∂x + O ( n − / ) (cid:21) + nλ ( z ) (cid:20) − √ n ∂f∂x + 12 n ∂ f∂x + O ( n − / ) (cid:21) + nλ ( z ) (cid:20) √ n ∂f∂y + 12 n ∂ f∂y + O ( n − / ) (cid:21) + nλ ( z ) (cid:20) − √ n ∂f∂y + 12 n ∂ f∂y + O ( n − / ) (cid:21) + nλ ( z ) (cid:20) √ n ∂f∂x − √ n ∂f∂y + 12 n ∂ f∂x + 12 n ∂ f∂y − n ∂ f∂x∂y + O ( n − / ) (cid:21) + nλ ( z ) (cid:20) − √ n ∂f∂x + 1 √ n ∂f∂y + 12 n ∂ f∂x + 12 n ∂ f∂y − n ∂ f∂x∂y + O ( n − / ) (cid:21) = 12 (cid:2) λ ( z ) + λ ( z ) + λ ( z ) + λ ( z ) (cid:3) ∂ f∂x + 12 (cid:2) λ ( z ) + λ ( z ) + λ ( z ) + λ ( z ) (cid:3) ∂ f∂y − (cid:2) λ ( z ) + λ ( z ) (cid:3) ∂ f∂x∂y + O ( n − / ) . As n → ∞ , L n f ( x, y ) → L f ( x, y ), where L f ( x, y ) = 12 ( σ b ( z )) ∂ f∂x + 12 ( σ a ( z )) ∂ f∂y + σ b ( z ) σ a ( z ) ρ ( z ) ∂ f∂x∂yσ b ( z ) := (cid:2) λ ( z ) + λ ( z ) + λ ( z ) + λ ( z ) (cid:3) / σ a ( z ) := (cid:2) λ ( z ) + λ ( z ) + λ ( z ) + λ ( z ) (cid:3) / ρ ( z ) := − λ ( z ) + λ ( z ) σ b ( z ) σ a ( z ) . Also notice that by our assumption, the initial condition satisﬁes ( X n (0) , Y n (0)) =( x, y ) ∈ R + × R + . The tightness gives the relative compactness of the sequence and and the convergence of inﬁnitesimal generators gives convergence in distributionfor ﬁnite ﬁxed time point, which guarantees the weak convergence on D [0 , T ], seee.g. Theorem 7.8(b) of Chapter 3 in Ethier and Kurtz [6]. Hence ( X n ( t ) , Y n ( t )) ⇒ ( Q b ( t ) , Q a ( t )) on D [0 , T ]. (cid:3) Proof of Theorem 3.

Recall that the price moves up is:(A.3) P up ( x, y ) = u ( x, y ) = P ( τ a < τ b | Q b (0) = x, Q a (0) = y ) . Then, u ( x, y ) satisﬁes the PDE:(A.4) σ b ( z ) ∂ u∂x + 2 ρ ( z ) σ b ( z ) σ a ( z ) ∂ u∂x∂y + σ a ( z ) ∂ u∂y = 0 , with the boundary condition: u (0 , y ) = 0 and u ( x,

0) = 1, where z = xx + y is theimbalance.Assuming that u ( x, y ) is a function of z so that u ( x, y ) = u ( z ), by the chain rule, ∂u∂x = y ( x + y ) u (cid:48) ( z ) , ∂u∂y = − x ( x + y ) u (cid:48) ( z )(A.5) ∂ u∂x = − y ( x + y ) u (cid:48) ( z ) + y ( x + y ) u (cid:48)(cid:48) ( z )(A.6) ∂ u∂y = 2 x ( x + y ) u (cid:48) ( z ) + x ( x + y ) u (cid:48)(cid:48) ( z )(A.7) ∂ u∂x∂y = x − y ( x + y ) u (cid:48) ( z ) − xy ( x + y ) u (cid:48)(cid:48) ( z ) . (A.8)Hence, the PDE reduces to: σ b ( z ) (cid:20) − y ( x + y ) u (cid:48) ( z ) + y ( x + y ) u (cid:48)(cid:48) ( z ) (cid:21) + 2 ρ ( z ) σ b ( z ) σ a ( z ) (cid:20) x − y ( x + y ) u (cid:48) ( z ) − xy ( x + y ) u (cid:48)(cid:48) ( z ) (cid:21) + σ a ( z ) (cid:20) x ( x + y ) u (cid:48) ( z ) + x ( x + y ) u (cid:48)(cid:48) ( z ) (cid:21) = 0 , which can be further reduced to the ODE: σ b ( z ) (cid:2) − z (1 − z ) u (cid:48) ( z ) + (1 − z ) z u (cid:48)(cid:48) ( z ) (cid:3) + 2 ρ ( z ) σ b ( z ) σ a ( z ) (cid:2) z (2 z − u (cid:48) ( z ) − z (1 − z ) u (cid:48)(cid:48) ( z ) (cid:3) + σ a ( z ) (cid:2) z u (cid:48) ( z ) + z u (cid:48)(cid:48) ( z ) (cid:3) = 0 , with the boundary condition u (0) = 0 and u (1) = 1, which can be rewritten as(A.9) µ ( z ) f ( z ) + ν ( z ) f (cid:48) ( z ) = 0 , where µ ( z ) = − − z ) σ b ( z ) + 2(2 z − ρ ( z ) σ b ( z ) σ a ( z ) + 2 zσ a ( z ) ν ( z ) = (1 − z ) σ b ( z ) − z (1 − z ) ρ ( z ) σ b ( z ) σ a ( z ) + z σ a ( z ) , and f ( z ) = u (cid:48) ( z ). This is a ﬁrst-order linear equation with solution(A.10) f ( z ) = C e − (cid:82) z µ ( x ) ν ( x ) dx , REDUCED-FORM MODEL FOR LEVEL-1 LIMIT ORDER BOOKS 23 and hence(A.11) u ( z ) = C (cid:90) z e − (cid:82) y µ ( x ) ν ( x ) dx dy + C , where C , C are two constants to be determined. By using the boundary conditions u (0) = 0 and u (1) = 1, we conclude that(A.12) P up ( x, y ) = P up ( z ) = u ( z ) = (cid:82) z e − (cid:82) y µ ( x ) ν ( x ) dx dy (cid:82) e − (cid:82) y µ ( x ) ν ( x ) dx dy . (cid:3) Proof of Theorem 7. u ( z ) as in Theorem 3 satisﬁes the ODE:(A.13) µ ( z ) u (cid:48) ( z ) + ν ( z ) u (cid:48)(cid:48) ( z ) = 0 , now with the boundary conditions u (0) = H and u (1) = 1 − H , where µ ( z ) = − − z ) σ b ( z ) + 2(2 z − ρ ( z ) σ b ( z ) σ a ( z ) + 2 zσ a ( z ) ν ( z ) = (1 − z ) σ b ( z ) − z (1 − z ) ρ ( z ) σ b ( z ) σ a ( z ) + z σ a ( z ) . As in the proof of Theorem 3, this ODE has the solution of the form(A.14) u ( z ) = C (cid:90) z e − (cid:82) y µ ( x ) ν ( x ) dx dy + C , where C , C are two constants to be determined. By using the boundary conditions u (0) = H and u (1) = 1 − H , we conclude that(A.15) P up ( x, y ) = P up ( z ) = u ( z ) = H + (1 − H ) (cid:82) z e − (cid:82) y µ ( x ) ν ( x ) dx dy (cid:82) e − (cid:82) y µ ( x ) ν ( x ) dx dy . (cid:3) Appendix B. TablesTable 3.

Summary of Volume Changes (NASDAQ)Imbalance BAC b BAC a GE b GE a GM b GM a JPM b JPM a0.0-0.1 0.349 0.499 0.367 0.508 0.640 0.462 0.628 0.4960.1-0.2 0.449 0.519 0.461 0.503 0.508 0.488 0.580 0.4870.2-0.3 0.458 0.542 0.439 0.513 0.480 0.517 0.498 0.5070.3-0.4 0.472 0.556 0.478 0.536 0.482 0.532 0.450 0.5420.4-0.5 0.503 0.537 0.507 0.539 0.508 0.531 0.472 0.5290.5-0.6 0.533 0.508 0.529 0.518 0.532 0.510 0.514 0.4480.6-0.7 0.540 0.488 0.537 0.490 0.530 0.489 0.518 0.4770.7-0.8 0.540 0.467 0.528 0.469 0.514 0.471 0.527 0.5140.8-0.9 0.522 0.479 0.523 0.472 0.494 0.498 0.516 0.5270.9-1.0 0.486 0.310 0.496 0.378 0.463 0.643 0.508 0.696

Table 4.

Summary of Volume Changes (NYSE)Imbalance BAC b BAC a GE b GE a GM b GM a JPM b JPM a0.0-0.1 0.561 0.540 0.586 0.596 0.730 0.598 0.723 0.6020.1-0.2 0.524 0.531 0.502 0.552 0.557 0.506 0.532 0.5310.2-0.3 0.514 0.529 0.490 0.531 0.502 0.497 0.491 0.5120.3-0.4 0.518 0.541 0.505 0.522 0.494 0.497 0.480 0.5110.4-0.5 0.517 0.553 0.518 0.529 0.485 0.486 0.492 0.5070.5-0.6 0.539 0.511 0.539 0.518 0.501 0.491 0.510 0.4940.6-0.7 0.551 0.506 0.528 0.513 0.496 0.496 0.499 0.4850.7-0.8 0.534 0.516 0.541 0.503 0.498 0.503 0.506 0.4910.8-0.9 0.518 0.535 0.553 0.512 0.510 0.550 0.527 0.5390.9-1.0 0.554 0.582 0.575 0.551 0.586 0.718 0.598 0.742

REDUCED-FORM MODEL FOR LEVEL-1 LIMIT ORDER BOOKS 25

Table 5.

Summary of Correlation of the Best Bid and Ask I m b a l a n ce B A C ( T ) B A C ( N ) G E ( T ) G E ( N ) G M ( T ) G M ( N ) J P M ( T ) J P M ( N ) . - . - . . - . - . - . - . - . - .

11 0 . - . - . . - . - . - . - . - . - .

20 0 . - . - . - . - . - . - . - . - . - .

29 0 . - . - . - . - . - . - . - . - . - .

34 0 . - . - . - . - . - . - . - . - . - .

32 0 . - . - . - . - . - . - . - . - . - .

34 0 . - . - . - . - . - . - . - . - . - .

33 0 . - . - . - . - . - . - . - . - . - .

34 0 . - . - . - . - . - . - . - . - . - .

26 0 . - . - . - . - . - . - . - . - . - .

29 0 . - . - . - . - . - . - . - . - . - .

28 0 . - . - . - . - . - . - . - . - . - .

30 0 . - . - . - . - . - . - . - . - . - .

29 0 . - . - . - . - . - . - . - . - . - .

26 0 . - . - . - . - . - . - . - . - . - .

18 0 . - . - . - . - . - . - . - . - . - .

17 0 . - . - . . - . - . - . - . - . - . Table 6.

Summary of Empirical Probability (E) and Model Pre-diction (P) for Stocks Traded on NASDAQ I m b a l a n ce B A C ( E ) B A C ( P ) G E ( E ) G E ( P ) G M ( E ) G M ( P ) J P M ( E ) J P M ( P ) . - . . . . . . . . .

158 0 . - . . . . . . . . .

196 0 . - . . . . . . . . .

235 0 . - . . . . . . . . .

275 0 . - . . . . . . . . .

315 0 . - . . . . . . . . .

356 0 . - . . . . . . . . .

396 0 . - . . . . . . . . .

436 0 . - . . . . . . . . .

475 0 . - . . . . . . . . .

514 0 . - . . . . . . . . .

553 0 . - . . . . . . . . .

591 0 . - . . . . . . . . .

630 0 . - . . . . . . . . .

668 0 . - . . . . . . . . .

706 0 . - . . . . . . . . .

744 0 . - . . . . . . . . .

780 0 . - . . . . . . . . .

815 0 . - . . . . . . . . .

848 0 . - . . . . . . . . . REDUCED-FORM MODEL FOR LEVEL-1 LIMIT ORDER BOOKS 27

Table 7.

Summary of Empirical Probability (E) and Model Pre-diction (P) for Stocks Traded on NYSE I m b a l a n ce B A C ( E ) B A C ( P ) G E ( E ) G E ( P ) G M ( E ) G M ( P ) J P M ( E ) J P M ( P ) . - . . . . . . . . .

179 0 . - . . . . . . . . .

209 0 . - . . . . . . . . .

241 0 . - . . . . . . . . .

275 0 . - . . . . . . . . .

310 0 . - . . . . . . . . .

347 0 . - . . . . . . . . .

385 0 . - . . . . . . . . .

425 0 . - . . . . . . . . .

465 0 . - . . . . . . . . .

506 0 . - . . . . . . . . .

545 0 . - . . . . . . . . .

584 0 . - . . . . . . . . .

623 0 . - . . . . . . . . .

661 0 . - . . . . . . . . .

696 0 . - . . . . . . . . .

730 0 . - . . . . . . . . .

763 0 . - . . . . . . . . .

794 0 . - . . . . . . . . .

823 0 . - . . . . . . . . . References [1] Fr´ed´eric Abergel and Aymen Jedidi. A mathematical approach to order book modeling.

In-ternational Journal of Theoretical and Applied Finance , 16(05):1350025, 2013.[2] Marco Avellaneda, Josh Reed, and Sasha Stoikov. Forecasting prices from level-I quotes inthe presence of hidden liquidity.

Algorithmic Finance , 1(1), 2011.[3] Rama Cont and Adrien De Larrard. Order book dynamics in liquid markets: limit theoremsand diﬀusion approximations.

Available at SSRN 1757861 , 2012.[4] Rama Cont and Adrien de Larrard. Price dynamics in a Markovian limit order market.

SIAMJournal on Financial Mathematics , 4(1):1–25, 2013.[5] Rama Cont, Sasha Stoikov, and Rishi Talreja. A stochastic model for order book dynamics.

Operations Research , 58(3):549–563, 2010. [6] S. N. Ethier and T. G. Kurtz.

Markov Processes: Characterization and Convergence . Wiley-Interscience, New Jersey, second edition, 2005.[7] A. Gar`eche, G. Disdier, J. Kockelkoren, and J.-P. Bouchaud. Fokker-Planck description forthe queue dynamics of large tick stocks.

Phys. Rev. E , 88:032809, Sep 2013.[8] Xin Guo, Zhao Ruan, and Lingjiong Zhu. Dynamics of order positions and related queues ina limit order book. arXiv preprint arXiv:1505.04810 , 2015.[9] He Huang and Alec N. Kercheval. A generalized birth-death stochastic model for high-frequency order book dynamics.

Quantitative Finance , 12(4):547–557, 2012.[10] Weibing Huang, Charles-Albert Lehalle, and Mathieu Rosenbaum. Simulating and analyzingorder book data: The queue-reactive model.

Journal of the American Statistical Association ,110:107–122, 2015.[11] Satish Iyengar. Hitting lines with two-dimensional Brownian motion.

SIAM Journal on Ap-plied Mathematics , 45(6):983–989, 1985.[12] Charles-Albert Lehalle and Sophie Laruelle.

Market Microstructure in Practice . World Sci-entiﬁc, Singapore, ﬁrst edition, 2013.[13] Jonathan Macey and David Swensen. The cure for stock-market fragmentation: More ex-changes.

The Wall Street Journal , May 31 2015.[14] Adam Metzler. On the ﬁrst passage problem for correlated Brownian motion.

Statistics &Probability Letters , 80(5-6):277–284, 2010.[15] Bernt Øksendal.

Stochastic diﬀerential equations . Universitext. Springer-Verlag, Berlin, sixthedition, 2003. An introduction with applications.[16] D. W. Stroock and S. R. S. Varadhan.

Multidimensional diﬀusion processes . Springer-Verlag,Berlin, 1979.

School of Mathematics, University of Minnesota

E-mail address : [email protected] Department of Mathematics, Florida State University

E-mail address ::