[PDF] A Fokker-Planck description for the queue dynamics of large tick stocks

Abstract

Motivated by empirical data, we develop a statistical description of the queue dynamics for large tick assets based on a two-dimensional Fokker-Planck (diffusion) equation, that explicitly includes state dependence, i.e. the fact that the drift and diffusion depends on the volume present on both sides of the spread. "Jump" events, corresponding to sudden changes of the best limit price, must also be included as birth-death terms in the Fokker-Planck equation. All quantities involved in the equation can be calibrated using high-frequency data on best quotes. One of our central finding is the the dynamical process is approximately scale invariant, i.e., the only relevant variable is the ratio of the current volume in the queue to its average value. While the latter shows intraday seasonalities and strong variability across stocks and time periods, the dynamics of the rescaled volumes is universal. In terms of rescaled volumes, we found that the drift has a complex two-dimensional structure, which is a sum of a gradient contribution and a rotational contribution, both stable across stocks and time. This drift term is entirely responsible for the dynamical correlations between the ask queue and the bid queue.

Full PDF

aa r X i v : . [ q -f i n . T R ] A p r A Fokker-Planck description for the queue dynamics of largetick stocks

A. Gar`eche a,b , G. Disdier a,c ,J. Kockelkoren a , J.-P. Bouchaud a August 16, 2018 a Capital Fund Management, 23 rue de l’Universit´e, Paris, France b now at: Marshall Wace LLP, The Adelphi, 1/11 John Adam StreetLondon WC2N 6HT, UK c now MFE Candidate, University of California, Berkeley, USA

Abstract

Motivated by empirical data, we develop a statistical description of the queue dynamics forlarge tick assets based on a two-dimensional Fokker-Planck (diﬀusion) equation, that explicitlyincludes state dependence , i.e. the fact that the drift and diﬀusion depends on the volumepresent on both sides of the spread. “Jump” events, corresponding to sudden changes of thebest limit price, must also be included as birth-death terms in the Fokker-Planck equation. Allquantities involved in the equation can be calibrated using high-frequency data on best quotes.One of our central ﬁnding is the the dynamical process is approximately scale invariant , i.e.,the only relevant variable is the ratio of the current volume in the queue to its average value.While the latter shows intraday seasonalities and strong variability across stocks and timeperiods, the dynamics of the rescaled volumes is universal. In terms of rescaled volumes, wefound that the drift has a complex two-dimensional structure, which is a sum of a gradientcontribution and a rotational contribution, both stable across stocks and time. This drift termis entirely responsible for the dynamical correlations between the ask queue and the bid queue.

Executing orders on modern electronic, double auction markets can be achieved by posting eithermarket orders or limit orders. Both market and limit orders have ﬂaws and merits. Market ordersare executed immediately against the best prevailing quote, but pay the half spread. Limit ordersare stored in the order book and are only executed when a market order crosses the spread. Theyappear to save the half spread but face selection bias.It is customary to distinguish small tick and large tick situations. The tick size is the minimumamount by which the price can change. Large ticks correspond to stocks for which the bid-askspread is most of the time equal to its minimum value (one tick). These relatively large spreadsattract limit orders, naturally leading to relatively large volumes at the best quotes.For large tick stocks, the volume of impinging market orders is typically much smaller than theavailable volume at the best price. In this case, the rule for executing outstanding limit ordersdepends on the market. For most markets (including those studied in the present paper), timepriority applies: the queue is “ﬁrst in-ﬁrst out”. For some markets, however, a proportionalityrule applies: all participants at the best quote get ﬁlled in proportion of their volume. If timepriority applies, the best situation for a limit order is to be ﬁrst in a large queue: in this case,the probability to be executed is large, while the adverse selection bias is small, because the large There are actually many variation on that theme, with for example, “Immediate or Cancel” orders, “Fill orKill” orders, “Iceberg” orders, etc. We are not concerned with these subtleties in the present context. two-dimensional Fokker-Planck (or diﬀusion) equation for the joint probability P ( V A , V B ; t ) to ﬁnd a volume V A at the ask and V B at the bid at time t . Their equation reads(albeit in very diﬀerent notations!): ∂P∂t = − ∂ ( F A P ) ∂V A − ∂ ( F B P ) ∂V B + ∂ ( D A P ) ∂V A + ∂ ( D B P ) ∂V B + 2 ∂ ∂V A ∂V B ( ρ AB P ) , (1)where P is a shorthand notation for P ( V A , V B ; t ), F A = F B is a constant (independent of V A , V B )that represents the systematic drift in the evolution of the volume of the queues, while D A = D B represent the diﬀusion coeﬃcients in volume space, related to the variance of volume changes perunit time, chosen again to be independent of both V A and V B . Finally, ρ AB is the covariance ofthe volumes changes on both sides of the quotes. It is the only term that couples the evolutionof the volume at the bid and at the ask in this model. For ρ AB = 0, the dynamics decouplesin the sense that P ( V A , V B ; t ) factorizes into P A ( V A ; t ) × P B ( V B ; t ), with P A , P B each obeying aone-dimensional Fokker-Planck equation: ∂P z ∂t = − ∂ ( F z P z ) ∂V z + ∂ ( D z P z ) ∂V z , z = A, B. (2)The aim of the present paper is to include the possible dependence of the drift ( F A , F B )and diﬀusion coeﬃcients ( D A , D B ) on the volume of the two queues, V A , V B . This dependence,neglected in [10], is expected on intuitive grounds and induces correlations in the dynamics of thetwo queues, even when ρ AB = 0. It is indeed reasonable to think that queues tend to grow whenthey are small and shrink when they are large, meaning that F z ( V z ), z = A, B , is positive for small V z and negative for large V z . One also expects that high volumes at the ask have a detrimentalimpact on the liquidity of the bid, and vice-versa, leading to a rich structure for the drift ﬁeld F A,B ( V A , V B ). These eﬀects are indeed very clearly revealed by empirical data and aﬀect quiteconsiderably the analysis of Cont & Larrard [10], in particular concerning the calculation of thetime needed to empty a queue and induce a price change. This equation further assumes that the bid price and the ask price have not changed between 0 and t . See belowfor the inclusion of price changes in this formalism. Aim of the paper and main results

The present study is mostly empirical, and aims at establishing the correct model for describingthe coarse-grained dynamics of the bid and ask queues, in the restricted but simpler case wherethe tick is so large that the relative change of volume induced by each individual order is small.Our data set covers all events at the best quotes of several large tick NASDAQ stocks during theyear 2010 – see the order book animation available at http://cfm.fr/msft.exe.We will ﬁrst present (in section 4) our results for the dynamics of a single queue (the bid or theask), independently of the state of the opposite queue. This will allow us to present in a simpliﬁedsetting some of our most salient results. We will investigate in particular the dependence of thedrift F z and of the diﬀusion constant D z as a function of the size of the corresponding queue V z .Some of our results are remarkably universal: although the trading rate and the average volume inthe queues show strong intraday seasonalities and vary signiﬁcantly between diﬀerent stocks, weﬁnd that upon appropriate rescalings of time and volumes, the statistical description of the queuedynamics is independent of the side of the queue (bid or ask), time of day, period, and consideredstocks, provided the size of the queues is large enough (which usually entails large ticks). We ﬁndin particular that, as expected on intuitive grounds, F z ( V z ) is positive for small V z and becomesnegative for larger V z .It turns out that the full Fokker-Planck description of single queues must involve several ad-ditional quantities. One describes the probability that a gap appears, momentarily increasing thebid-ask spread by one tick. Suppose one focuses on the bid, z = B . There is a chance that the nextlimit order that ﬁlls the newly created gap is a buy limit order. In that case, the new bid jumps onetick up. This happens with a certain probability Q + that may again depend on V B , and when thishappens, the new bid starts with a volume distributed according to a certain P + ( V B ). Conversely,the volume at bid can be eaten by a large sell market order, in which case the bid goes downone tick, and is replaced by the queue just behind it. This happens with probability Q − ( V B ) andwhen this happens, the new bid starts with a volume distributed according to a certain P − ( V B ).Similar quantities describe the same events at the ask, z = A . Our convention will be to use a +subscript for events that improve the quotes (bid up or ask down one tick), and a − subscript forevents that degrade the quotes (bid down or ask up one tick). Again rescaling times and volumes,we ﬁnd that these new quantities are again to a large degree universal. Finally, we test directlythe validity of our Fokker-Planck description by comparing the empirically determined stationarydistribution of volumes P st ( V A ) (or P st ( V B )) with the one predicted by the equilibrium solution ofthe Fokker-Planck equation.We then turn in section 5 to the full two-dimensional description of the dynamics. We ﬁnd(empirically) a complete statistical symmetry between the bid and the ask, in particular that thestationary distribution obeys P st ( V A , V B ) = P st ( V B , V A ), while the drifts and diﬀusion coeﬃcientsare such that F A ( V A , V B ) = F B ( V B , V A ) and D A ( V A , V B ) = D B ( V B , V A ), and similarly for Q ± and P ± , that describe change of prices. We again ﬁnd that upon adequate rescaling, all these quantitiesare universal. In view of the rather subtle pattern revealed by the drift ﬁeld ( F A , F B ) (see Figs. 8& 9 below), this universality is far from trivial.As pointed out by Cont & Larrard [10], a model for the dynamics of queues is valuable for manypurposes. One of the interesting quantity that our model allows one to compute (in principle) isthe probability that, starting from a volume conﬁguration ( V B , V A ), the queue bid moves downbefore the ask bid moves up, or the distribution of times before the bid price moves, etc. Due tothe complexity of the calibrated model, this can however only be achieved by running numericalsimulations, and we leave this question for future investigations (see also the discussion in Sect. 6below). Our data consists in all events on the NASDAQ platform (market orders, limit orders, cancellations)occurring at the best quotes (bid/ask) for NASDAQ stocks during the year 2010. These ordersrepresent only ≈

40% of the total activity, but in view of the universality of our results once rescaledby the appropriate average volume, we believe that our conclusions are not signiﬁcantly aﬀectedby the missing data. We do not consider iceberg limit orders either, which is consistent with3tock Names Ticker Av. Price ¯ L ¯ V ¯ N h| V |i ¯Π π + Cisco Systems, Inc. CSCO 23.2 44.2 25,000 2,240 900 0.86 0.19eBay EBAY 24.6 17.2 4,900 1,240 475 0.87 0.15Gilead Sciences, Inc. GILD 39.3 11.1 2,350 1,140 335 0.85 0.21Microsoft Corporation MSFT 27.1 44.8 22,100 2,630 890 0.90 0.21Oracle Corporation ORACL 24.7 29.6 12,800 1,800 730 0.87 0.16Table 1: Summary statistics for the 5 stocks chosen for this study. The average price (in USD)is over the year 2010. ¯ L (resp. ¯ V ) corresponds to the average number of individual orders (resp.average volume in shares) in the queue at any instant of time (see Fig. 1 for the intraday pattern).¯ N is the average number of events that modify L and V during a a 5 minutes interval. h| V |i is theaverage (absolute) change of volume for each event that does not change the price. ¯Π is the averageprobability that an event does not change the best level (see Fig. 3 for the intraday pattern). π + is the average probability that a freshly emptied queue is immediately reﬁlled, meaning that thebest quote does not change after such an event. Note that π + is around 0 .

20 for most of the day,but with sharp peaks around the open and the close, when π + reaches 0 . − . <

40 USD, i.e. ticks > . L the size of the queue measured in number of diﬀerent individual orders(which can all be of diﬀerent volumes), V the size of the queue in total volume (i.e. number ofshares), and N the number of events that modify L and V during a speciﬁc period, in our casea 5 minutes bin. The trading day is therefore divided into 78 bins of 5 minutes. L and V giveslightly diﬀerent informations about the size of the queue, and the Fokker-Planck formalism couldbe applied for each of these two variables. We have in fact studied both cases, with very similarconclusions [11]. However, taking the total volume V leads to less noisy, more regular observablesand probably makes more ﬁnancial sense, so we restrict in this paper to volumes only and willwrite an evolution equation for P ( V A , V B ; t ), the joint probability to observe volumes V A at theask and V B at the bid at “time” t , where time will be counted here in event time .The ﬁrst interesting information is to characterize the average daily pattern of the activity N and size of the queues L A,B or V A,B . Averaging over all days and all stocks, we obtain thecharacteristic patterns shown in Fig. 1 for ¯ L ( b ) and ¯ V ( b ), where b = 1 , , . . . ,

78 is the bin number.For the total activity ¯ N ( b ), we ﬁnd the familiar U -shape (not shown) : activity is high in themorning, lower at noon, and high again at the end of the day. The total volume in the book, onthe other hand, is quite low at the open and steadily rises as one moves into the day, with aninteresting acceleration towards the end of the day. The plots shown in Fig. 1 are averages overthe bid and the ask, and averages over the 5 chosen stocks, but we have checked that the individualpatterns for ¯ L ( b ), ¯ V ( b ) and ¯ N ( b ) are the same up to an overall multiplicative factor. While therescaling is not perfect and some diﬀerences of order ∼

20% between stocks might be relevant fora ﬁner analysis, we are content with the idea that in a ﬁrst pass, universality holds. Interestingly, the intraday pattern of the volume is accurately ﬁtted by the following simplefunctional form (see Fig. 1): ¯ V ( b ) ≈ a + a ln b + a − b . (3)The meaning of this functional form is that the volume increases quickly at the beginning of theday before a quasi-plateau, as captured by the initial logarithmic dependence, before the ﬁnalincrease that shows an apparent divergence at the end of the day. We note that such an inverse The diﬀerence between the bid and ask observables is of the same order of magnitude, but it is highly reasonablethat the statistics of high frequency activity should be very close to being buy/sell symmetric.

20 40 60 80b01000020000300004000050000 V ( b ) , L ( b ) V(b): DataV(b): FitL(b): Data (x 1000)

Figure 1: Intraday pattern of the average volume ¯ V ( b ), and average number of orders ¯ L ( b ) inthe queue, and ﬁt with Eq. (3). Averages are over all days and all ﬁve stocks. Note that bothquantities are in fact close to being proportional to each other, with an average (over all stocks)volume per order ≈ Let us now turn to what will be the central object of the present study, namely the probability P st ( V A ) that the ask queue has volume V A (or similarly for the bid volume V B ). Clearly, sincethe average volume ¯ V depends on the time of the day and on the stock, P st cannot be universal.Our central assumption, that is approximately borne out by the data, is that for large queues, allaspects of the queue dynamics only depends on relative volumes , i.e. on the ratio of the existingvolume V A over the average volume ¯ V . In other words, introducing x A = V A / ¯ V and x B = V B / ¯ V ,one expects that P st ( x ) is approximately universal, both in time and across stocks. As we will seelater, this assumption naturally generalizes to other quantities as well.We show P st ( x ) in Fig. 2-a for the ﬁve stocks under scrutiny (averaged over the bid and atthe ask). For all stocks, we observe a hump shaped distribution that peaks around the averagevalue ¯ x = 1. The probability of much larger queues goes to zero slightly slower than exponentially.Of course, the joint distribution of x A = x and x B = y contains more information, and is shownas a contour plot in Fig. 2-b, here averaged over all 5 stocks. We ﬁnd, quite interestingly, that P st ( x, y ) exhibits a broad peak around x ≈ y ≈

1. This means that the most probable situationis that both queues are of similar height, with a an average volume ¯ V ( b ) that is bin- and stock-dependent. (see also the animation available at http://cfm.fr/msft.exe).What are the dynamical mechanisms at the origin of these speciﬁc, humped shaped distribu-tions? The aim of the following sections is to develop a precise picture of the stochastic processgoverning the joint evolution of the two queues. We ﬁrst focus on a one-dimensional model, thatdiscards all information about the opposite queue, before expanding on the full two dimensionalmodel in the following section. We start by writing a general Master equation for the evolution of the probability that the volumeat the bid or at the ask is equal to V , at (event) time t . Assuming for the moment that there isno change of the corresponding price between t and t + 1, this reads: P ( V, t + 1) = X ∆ V P ( V − ∆ V, t ) ρ (∆ V | V − ∆ V ) , (4) Note that leaving the exponent ψ of the divergence as a free parameter, a best ﬁt of the data leads to ψ ≈ . P s t ( x ) CSCOEBAYGILDMSFTORACL

Figure 2: Left: Individual P st ( x ) for the 5 stocks studied, obtained by averaging over all days thedistribution of the rescaled variable V / ¯ V ( b ). The y -axis is in log scale. Right: Two dimensionaljoint distribution of rescaled volumes at the bid and the ask, shown as contour levels of P st ( x, y ),which exhibits a broad peak around x ≈ y ≈

1. Note the symmetry around the line x = y .where ∆ V are the possible changes of volume related to limit orders (∆ V >

0) and market ordersor cancellations (∆

V < V dependent probability ρ (∆ V | V ). The aboveMaster equation assumes that the process is Markovian, i.e. no memory in time, apart from theone encoded in the instantaneous size of the queue. When V is large, one may expect that changesof the queue size at each time step is relatively small: ∆ V ≪ V . This allows one to treat V as a continuous variable, expand the above equation in powers of ∆ V . The general expansion ofthe Master equation in powers of ∆ V is called the Kramers-Moyal expansion; when truncated tosecond order, this leads to the Fokker-Planck equation [13]. In the present case, one ﬁnally gets: P ( V, t + 1) − P ( V, t ) ≈ − ∂ [ F ( V ) P ( V, t )] ∂V + ∂ [ D ( V ) P ( V, t )] ∂V ; (5)with: F ( V ) = X ∆ V ∆ V ρ (∆ V | V ); D ( V ) = 12 X ∆ V (∆ V ) ρ (∆ V | V ); (6)in other words, F ( V ) is the average volume change conditional to a certain volume V , whereas D ( V ) is the (one-half) of the average volume change squared, again conditioned to V .There are two additional processes that need to be taken into account in order to faithfullydescribe the dynamics of queues – say the bid. • One is that the opposite ask moves up one tick, which leads to a situation where the spreadbetween the bid and the ask is temporarily equal to two ticks. If a new buy limit order ﬁllsthe incipient gap the ‘old’ bid, with volume V then gets suddenly replaced by a new bid, with(usually) smaller volume. In the Master equation language, this corresponds to a large jumpfor which the assumption that ∆ V is small is not warranted. We instead want to model thiseﬀect by adding to the right hand side of Eq. 5 a “birth-death” term of the form: − Q + ( V ) P ( V, t ) + "X V ′ Q + ( V ′ ) P ( V ′ , t ) P + ( V ) , (7)where Q + ( V ) is the probability that a queue of size V gets overtaken by a new queue at animproved price, and P + ( V ) is that probability that a newly created queue starts with volume V . • The second eﬀect is that when the bid has a small volume, there is a ﬁnite probability thatthe bid is eaten entirely by a market order or by a cancellation. Two things can happen in Note that Q + ( V ) does not count events where the opposite quote disappears and immediately reappears at thesame price, leaving the considered quote unchanged. In other words, Q + ( V ) already includes the probability π + . Π Figure 3: Dependence of the probability of non price-changing events Π ( x ) on the rescaled variable x = V / ¯ V , and averaged over all 5 ﬁve stocks. Note the dip for small volumes in the queue, whichhave a large probability to be eaten by a single market order.this case: either the queue one tick below the old bid becomes the new bid, or some volumeimmediately comes back with no price change. Both cases again correspond to “jumps”in the Fokker-Planck framework. We write that with probability Q − ( V ) the old queue iscompletely eaten. With probability π − × P − ( V ′ ) it is replaced by the queue just below ofsize V ′ , and with probability π + = 1 − π − some new volume V ′ reappears at the same price,with probability P + ( V ′ ). The “birth-death” term now reads: − Q − ( V ) P ( V, t ) + "X V ′ Q − ( V ′ ) P ( V ′ , t ) [ π + P + ( V ) + π − P − ( V )] , (8)This is described by exactly the same term as in Eq. (7) above, with Q + → Q − and P + → π + P + ( V ) + π − P − ( V ), with π + ≈ . − .

2, see Table I.Note however that these price changing events impose that the distribution of volume changes, ρ (∆ V | V ) is not normalized to unity rather but to Π ( V ) = 1 − Q + ( V ) − Q − ( V ), the probability thatan event does not change the price. In the following, we will use the notation F ( V ) , D ( V ) for theaverage drift and diﬀusion conditional to no price change, and ˜ F ( V ) , ˜ D ( V ) for the unconditionalquantities, with: ˜ F ( V ) = Π ( V ) F ( V ); ˜ D ( V ) = Π ( V ) D ( V ) . (9)Empirically, Π ( V ) is in fact found to be ≈ .

9, see Fig. 3 and Table 1.

As discussed above, it is reasonable to expect that the volume dynamics is, for large queues, scale-invariant, in the sense that only the ratio x = V / ¯ V matters, where ¯ V is the average volume in thequeue, which is both stock- and time-of-day-dependent. It is easy to see that the Fokker-Planckequation, in terms of the rescaled volume x , takes the following form: P ( x, t + 1) − P ( x, t ) ≈ − ∂ [ ˜ f ( x ) P ( x, t )] ∂x + ∂ [ ˜ d ( x ) P ( x, t )] ∂x − q + ( x ) P ( x, t ) ++ "X x ′ q + ( x ′ ) P ( x ′ , t ) P + ( x ) − q − ( x ) P ( x, t ) + "X x ′ q − ( x ′ ) P ( x ′ , t ) [ π + P + ( x ) + π − P − ( x )] (10)with: ˜ f ( x ) := ˜ F ( x ¯ V )¯ V − x ¯ V d ¯ Vdt , (11) Formally, this corresponds to two events. However, we ﬁnd it more consistent to restrict the state space of themodel to situations where the spread is equal to one tick, and remove from the description the highly transientsituations where the spread is equal to two ticks. Figure 4: Drift f ( x ) and diﬀusion d ( x ), conditioned to no price change, as a function of the rescaledvolume x . Note that, as indicated by the dotted horizontal and vertical lines, f ( x = 1) ≈

0. Forlarge x , d ( x ) increases by a factor 10 or more compared to the value d ( x = 1).˜ d ( x ) = ˜ D ( x ¯ V )¯ V ; q ± ( x ) := Q ± ( x ¯ V ) , (12)and all probability densities such that, in a shorthand notation, P ( x ) dx = P ( V ) dV . Note thatEq. (10) is such that P x P ( x, t + 1) = P x P ( x, t ), as it should be.Eq. (10) is the central equation of this work, and deﬁnes in a precise manner our model forsingle queue dynamics. All the information needed to determine the input of this equation (namelythe functions f ( x ) , d ( x ) , q ± ( x ) and P ± ( x )), can be precisely calibrated on data. Indeed, thanks tothe simplifying scale-invariance assumption , all these quantities can be determined by aggregatingdata at diﬀerent times of the day for a single stock, and by further averaging over diﬀerent stocks.Again, there might be slight inter-stock variations, or some dependence on the speciﬁc periodof time, but our detailed analysis of the data has convinced us that as a ﬁrst approximation ,the scale-invariance property holds with reasonable accuracy. More work is needed to ascertainwhether these variations are statistically signiﬁcant, but this is well beyond the scope of the presentstudy. Based on the above scaling assumption, we have determined the functions f ( x ) , d ( x ) , q ± ( x ) and P ± ( x ) on our data set, averaging over all 5 stocks mentioned above, all days of 2010 and all 78 binsof each day. The results are shown in Fig. 4 and Fig. 5. Fig. 4 reveals intuitive, but interestingresults: we ﬁnd that, as expected, the drift f ( x ) is negative for large x ’s, meaning that long queues(as compared to the average value) are shrinking, whereas short queues are expanding. In fact,we ﬁnd that the drift vanishes when x ≈

1, i.e. for average-sized queues. The coeﬃcient d ( x )essentially measures the intensity of activity in the queue; it is found to decrease slightly between x = 0 and x = 1, before gradually increasing and becoming 10 times larger for x ≈

4. Thequantities q ± ( x ) and P ± ( x ) are plotted in Figs 5-a and 5-b, respectively. One observes that q − ( x )reaches a minimum for typical queue sizes ( x ∼ q + ( x ) is seen to increase monotonically as a function of the size of thequeue. This is quite expected: if a gap opens in front of the current best, the incentive to place anorder there rather than to join the queue increases as its volume x increases. The plots of P ± ( x )shown in Fig. 5-b are also not surprising: the distribution of the volume at the second best level( P − ( x )) is similar to the unconditional distribution of the best ( P ( x )), whereas the distribution ofvolume at incipient levels ( P + ( x )) is strongly peaked at x = 0.8 + (x)Q - (x) 0 1 2 3 4 5x0.00010.0010.010.11 P + (x)P - (x)P(x) Figure 5: Left: Probability that the current queue disappears by being overtaken ( q + ( x )) or bybeing completely eaten ( q − ( x )), both as a function of the rescaled volume x . Right: Probabilitythat the newly appeared volume is x , at a better price ( P + ( x )) or at a worse price ( P − ( x )). < | ∆ V | > AskBid

Figure 6: Dependence of h| ∆ V |i as a function of the time of day, in 5-minute bin units, averagedover all stocks (see also Table I). Our ﬁnal model, Eq. (10), is based on two major assumptions. One is that the dynamics isMarkovian, i.e. it depends on the past ﬂow of order only through the current size of the queue (relative to its average for a given stock and a given hour of the day). In other words, one assumesthat there is no temporal correlation in the type and volume of events. The second assumption isthat one can assume the change of volume to be small for all events that do not change the priceof the bid/ask.We have checked that the event correlations is small, so the ﬁrst assumption appears to bewarranted. The second assumption is however not completely justiﬁed: as shown in Fig. 6 thetypical volume change ∆ V is ∼ V ∼ , V has heavy (power-law)tails, which means that higher order derivatives in the Kramers-Moyal expansion could play a roleand invalidate the Fokker-Planck truncation.We propose to check directly of the validity of the Fokker-Planck approximation by usingEq. (10) with the empirically determined inputs ( f ( x ) , d ( x ) , q ± ( x ) and P ± ( x )) to reconstruct thestationary distribution P st ( x ). The steady-state equation reads: − ∂ [ ˜ f ( x ) P st ( x )] ∂x + ∂ [ ˜ d ( x ) P st ( x )] ∂x = q + ( x ) P st ( x ) − "X x ′ q + ( x ′ ) P st ( x ′ ) P + ( x )+ q − ( x ) P st ( x ) − "X x ′ q − ( x ′ ) P st ( x ′ ) [ π + P + ( x ) + π − P − ( x )] . (13)There is no general analytic solution for P st ( x ). However, when q ± ( x ) = 0, the zero-current9 Figure 7: Empirical P st ( x ), averaged over the 5 stocks, and reconstructed P GB ( x ) ∝ d − ( x ) e − u ( x ) .The y -axis is in log scale. We have represented for clarity both the pdfs P ( x ) and the cumulativedistribution functions R x d x ′ P ( x ′ ).solution is simply given by the Gibbs-Boltzmann measure: P GB ( x ) ∝ d ( x ) exp [ − u ( x )] ; u ( x ) = − Z x d x ′ f ( x ′ ) d ( x ′ ) , (14)where u ( x ) is the “potential” and d ( x ) can be interpreted as a local, x -dependent temperature.Since 1 − ¯Π ∼ . q ± ( x ) = 0 should be a reasonable approximation. The result is shown in Fig. 7. This approximationcaptures well the overall humped shape of P st ( x ). This is a direct consequence of the fact that thedrift f ( x ) vanishes for x = 1, corresponding to a minimum of u ( x ).Although not perfect, the agreement between the empirical distribution P st ( x ) and the recon-structed Gibbs-Boltzmann measure P GB ( x ) is far from trivial, since the quantities f ( x ) and d ( x )needed to reconstruct P GB ( x ) are measured from the dynamics of the queue. We believe that thisapproximate agreement, with no extra ﬁtting factor , is a convincing empirical validation of ourFokker-Planck formalism.Finally, we note that the Fokker-Planck equation in the continuous time limit is equivalent,between two price jumps, to a Brownian motion model for the rescaled queue size x , given by:d x = f ( x )d t + p d ( x )d W, (15)where d W is the standard Wiener noise. This is interesting for numerical simulation purposes.Suppose for example one starts in a situation where the bid queue is at x = x for t = 0 andask for the probability that the queue empties while always remaining at the same price. Onecan integrate the diﬀusion equation above with initial condition x = x , adding the possibility ofprice changing processes at each time step. With probability q + ( x )d t the best price is improved, inwhich case the price goes up, and the process stops. With probability q − ( x )d t , on the other hand,a large volume eats the whole queue, the price goes down and the process also stops. Finally, if thewalk may survive and reach x = 0 for the ﬁrst time at t ; this event contributes to the probabilityone wants to compute. Generalizing the above arguments to the joint dynamics of the bid volume V B and ask volume V A , we introduce relative volumes x = V B / ¯ V and y = V A / ¯ V . (Note again that the stock andtime dependent average volume ¯ V is the same for the bid and the ask). The scale-invariant,10wo-dimensional Fokker-Planck equation now reads: P ( x, y, t + 1) − P ( x, y, t ) ≈ − ∂ [ ˜ f x ( x, y ) P ( x, y, t )] ∂x − ∂ [ ˜ f y ( x, y ) P ( x, y, t )] ∂y + ∂ [ ˜ d x ( x, y ) P ( x, y, t )] ∂x ++ ∂ [ ˜ d y ( x, y ) P ( x, y, t )] ∂y − [ q + ( x | y ) + q + ( y | x )] P ( x, y, t ) +  X x ′ ,y ′ [ q + ( x ′ | y ′ ) P ( x ′ , y ′ , t )  P + ( x, y ) ++ X y ′ ,x ′ [ q + ( y ′ | x ′ ) P ( x ′ , y ′ , t )  P + ( y, x ) − [ q − ( x | y ) + q − ( y | x )] P ( x, y, t ) +  X x ′ ,y ′ q − ( x ′ | y ′ ) P ( x ′ , y ′ , t )  ×× [ π + P + ( x, y ) + π − P − ( x, y )] +  X y ′ ,x ′ q − ( y ′ | x ′ ) P ( x ′ , y ′ , t )  [ π + P + ( y, x ) + π − P − ( y, x )] , where ˜ f x,y ( x, y ) is the average drift of the rescaled bid/ask volume, conditioned to a certain ( x, y ),and ˜ d x,y ( x, y ) is the diﬀusion constant in the x / y direction, again conditioned to a certain ( x, y ).In order to be precise, we specify these deﬁnitions as follows:˜ f x ( x, y ) := ˜ F x ( x ¯ V , y ¯ V )¯ V − x ¯ V d ¯ Vdt , ˜ F x ( x ¯ V , y ¯ V ) = Π X ∆ V B ∆ V B ρ (∆ V B | V B , V A ); (16)and ˜ d x ( x, y ) = ˜ D ( x ¯ V , y ¯ V )¯ V ; ˜ D x ( x ¯ V , y ¯ V ) = Π X ∆ V B (∆ V B ) ρ (∆ V B | V B , V A ) . (17)Note the extra factor 1 / / / ρ (∆ V B | V B , V A ) is normalized to the probability of noprice changing events for a given V B , V A . By symmetry, we expect (and have indeed conﬁrmedempirically) that f x ( x, y ) = f y ( y, x ) and d x ( x, y ) = d y ( y, x ).The quantities q ± ( x | y ) are, respectively, the probability that, for the next event, a queue ofrescaled volume x , facing a queue of rescaled volume y , disappears entirely ( q − ) or gets superseded( q + ) by a new queue. Correspondingly, the quantity P − ( x | y ) gives the probability that the secondbest queue that becomes the best queue has volume x , knowing that the opposite queue has volume y , whereas P + ( x | y ) gives the probability that the newly created best has volume x , knowing thatthe opposite queue has volume y . (In order to simplify the presentation, Eq. (16) in fact assumes,in line with our empirical results, a total bid/ask symmetry for the statistics of these price changingevents, e,g. q B + ( x | y ) = q A + ( x | y )).Finally, note that the mixed diﬀusion term ρ∂ /∂x∂y originally introduced by Cont & Larrard[10] is not present in the above equation. This is because we have not found any signiﬁcantcorrelations between the ﬂuctuations of volume changes at the bid and at the ask, that wouldjustify the presence of such a term. However, this does not mean that we neglect the couplingbetween the two queues, which is entirely encoded in the two dimensional drift ﬁeld ~f = ( f x , f y ),which is a function of the volumes on both sides (see Fig. 8 below).We now present an empirical determination of these two-dimensional quantities on the samedata set as above. Based on our scaling assumption, we again determine the two dimensional functions f x , f y , d x , d y , q ± and P ± on our data set, averaging over all 5 stocks mentioned above, all days of 2010 and all78 bins of each day. We ﬁnd (see Fig 8-a) that the diﬀusion coeﬃcients are, in fact, to a very goodapproximation independent of the size of the opposite queue, i.e.: d x ( x, y ) = d ( x ) , ∀ x, d y ( x, y ) = d ( y ) , ∀ y, (18)where d ( . ) is the one-dimensional diﬀusion coeﬃcient. This independence of d x,y on the size of theopposite queue is compatible with the absence of correlation of the ﬂuctuations of activity on thebid and on the ask. 11igure 8: Left: Diﬀusion coeﬃcient of the bid queue, d x ( x, y ), as a function of the (rescaled)bid volume x and the ask volume y . This level representation makes it clear that the diﬀusioncoeﬃcient is independent of the volume of the opposite queue. Right: Arrow representation of thedrift ﬁeld ~f ( x, y ) in two-dimensions. Note the bid-ask symmetry that implies f x ( x, y ) = f y ( y, x ).While ~f ≈ x ≈ y ≈

1, the drift is to a good approximation parallelto the diagonal ~e = (1 , x, y )of the rescaled queue volumes, one can determine the two dimensional drift vector ~f = ( f x , f y ),which is represented as arrows in Fig. 8-b. This is obtained as a grand average across time andacross stocks, but we have found that the pattern reported in Fig. 8-b is actually the same fordiﬀerent stocks, or when we divide the 2010 time period in monthly sub-intervals, or else when wefocus on morning hours or afternoon hours [11]. What makes this universality possible at all is ofcourse that we work with rescaled volumes. (Very similar patterns appear if one works with L ,the number of diﬀerent orders in the queue, rather than V , the total volume.) One sees a patternrecalling, at ﬁrst glance, the one-dimensional situation: large queues tend to shrink while smallqueues tend to grow, with a central region around x ≈ , y ≈ ~f = − ~ ∇ u + ~ ∇ × ~w, (19)where u ( x, y ) is the potential (similar to the one-dimensional object above) and ~w = (0 , , w ) isa vector orthogonal to the x, y plane. This second, rotational part, contributes to closed currentloops in equilibrium, whereas the potential part does not.These potentials are represented in Fig 9. Again, the patterns are very robust and appear tobe signiﬁcant even in regions where the probability to ﬁnd a queue is small (i.e. x > y > u ( x, y ) only depends, in a ﬁrstapproximation, on r = x + y ; it has a broad, shallow minimum around r ≈ x ≈ y ≈ r ≈ r ≈

7. Note however thatthis secondary minimum does not lead to a peak in the stationary distribution P st ( x, y ) because thediﬀusion coeﬃcients d x and d y are very large in this region (see Fig. 4). The rotational componentof the drift, represented in Fig. 9-b, is quite complex, but its structure, in the most relevant region x ∼ y ∼ x = y , this component of thedrift is towards zero, i.e. queues of similar size tend to shrink together. When one queue is muchsmaller than the other, the drift is directed towards the diagonal (queues tend to equilibrate),before bending towards zero again closer to the diagonal. It would be very interesting to build atheory that could explain the intricate pattern displayed by the drift ﬁeld ~f , especially because, asemphasized above, this pattern appears to be stable across stocks and across time. The Mean-FieldGame approach of [8], appears to be a way to approach the problem.12igure 9: Level plots of the potentials u ( x, y ) (left) and w ( x, y ) (right), obtained by averaging overall stocks. Note the high ridge appearing for u ( x, y ) for x + y ≈

5, and the complex ﬂow patterninduced by w . These patterns are found to be very similar for all stocks and time periods.Note that the presence of the rotational component prevents us from writing down the Boltzmann-Gibbs measure that generalizes Eq. (14) to the two-dimensional case, even in the region x ∼ y ∼ d x = d y = constant. However, from the general pattern of the ﬂow ﬁeld shown in Fig. 8,it is intuitively clear that the resulting stationary distribution P st ( x, y ) should have the humpedshape shown in Fig. 2-b.Finally, the quantities q ± ( x | y ) and P ± ( x, y ) can be studied (not shown here). The noticeablepatterns are: • when x ∼ y ∼

1, the probabilities of price changing events q ± ( x | y ) reach a minimum. q + ( x | y )remains small as x → y ∼

1, which means that if the bid becomes much smaller than theask, the probability that the bid goes up is small, which makes sense since the sell pressure onthe ask is larger than the buy pressure on the bid. Conversely, as expected, q − ( x | y ) remainssmall when x ∼ y → • P − ( x, y ) has a double peak structure: conditionally to an event where the best price disap-pears and the second best price takes over, the most probable size of the queue is x ≈ y ∼ y ≪ • P + ( x, y ) has a sharp peak for x ∼ y ≪

1, and a broader peak for y ∼ x ≪ Motivated by empirical data, we have proposed a statistical description of the queue dynamicsfor large tick assets based on a two-dimensional Fokker-Planck (diﬀusion) equation, that explicitlyincludes state dependence , i.e. the fact that the drift and diﬀusion depends on the volume presenton both sides of the spread. “Jump” events, corresponding to sudden changes of the best limitprice, must also be included as birth-death terms in the Fokker-Planck equation. All quantitiesinvolved in the equation can be calibrated using high-frequency data on best quotes. One ofour central ﬁnding, repeatedly emphasized throughout the paper, is the the dynamical process is approximately scale invariant , i.e., the only relevant variable is the ratio of the current volume inthe queue to its average value. While the latter shows intraday seasonalities and strong variabilityacross stocks and time periods, the dynamics of the rescaled volumes is universal. In terms ofrescaled volumes, we found that the drift has a complex two-dimensional structure, which is a sumof a gradient contribution and a rotational contribution, both stable across stocks and time. Thisdrift term is entirely responsible for the dynamical correlations between the ask queue and the bid13ueue. The structure of the diﬀusion term, on the other hand, is found to be quite trivial, with nodependence on the opposite volume.Although our scale invariance assumption is, we believe, a suitable ﬁrst approximation todescribe queue dynamics, a detailed study of the violations of this assumption would be interestingand could reveal some systematic dependence on stock characteristics (price, liquidity, market cap,etc.) or hour of the day, for example. Clearly, scale invariance should only hold for suﬃciently largevolumes in the queues; we therefore expect that violations will be more pronounced for smalleraverage volumes and will be very strong for small tick stocks. Another issue that would certainlydeserve further work is whether the universality uncovered here for NASDAQ stocks extends toother types of large tick securities with time priority (for example, non US stocks, large tick futurescontracts, etc.)Another open question is the validity of the Fokker-Planck (diﬀusion) framework, which amountsto truncate the Kramers-Moyal equation to second order. Such a truncation is not immediatelyjustiﬁed since the distribution of elementary volume changes ∆ V for each event (execution of amarket order, addition or cancellation of a limit order) is found to have heavy tails. Still we haveshown that when solved to give the stationary distribution of rescaled volumes in a queue, theFokker-Planck equation calibrated on dynamical data fares quite well at reproducing the empirical(static) distribution.It would also be very interesting to develop a theory, based on equilibrium, optimizing agents,or on agents using heuristic/behavioral rules, able to reproduce the ﬁne details of the ﬂow ﬁeld ~f shown in Fig. 8. As mentioned above, the statistical determination of this ﬂow ﬁeld is quiteaccurate, and the pattern appears to be robust across stocks and time periods, once expressed inreduced volumes. We believe that this comparison will prove to be a stringent test for theoreticalassumptions on the behaviour of agents in ﬁnancial markets.Finally, we want to emphasize that the theory developed above is not complete. For example,it does not allow us to answer a crucial question as far as optimal execution is concerned, i.e.:if I place a sell order on a queue of volume V A , knowing that the opposite volume is V B , whatis the probability that my order will be executed, and how long should I wait? The answer tothese questions require an additional information, absent from the above framework, which is theposition in the queue of the cancelled orders. While added orders are always at the back of thequeue, cancelled orders can be anywhere in the queue. Clearly, the position of these cancelled ordersmatter, and determine the speed at which my own order makes it to the top. Our preliminarystatistical analysis suggests that the probability q ( H | L ) that the H -th order is cancelled, in aqueue that contains a total of L orders, again takes a scaling form: q ( H | L ) ∝ Q ( u ), where u =( L − H ) /L / and Q ( u ) is a decreasing function of u . This means that, as expected, the ordersmost likely to be cancelled are those at the back of the queue – this statement becoming sharp asthe height of the queue L goes to inﬁnity. The unexpected ﬁnding, for which we have currentlyno interpretation, is that the width of the region where these orders are cancelled grows with theheight of the queue as a fractional power, L / . We leave this as an intriguing open question. Acknowledgements

We want to thank Xavier Brokmann, Charles Lehalle, Marc Potters and Spyros Skouras for veryhelpful discussions and suggestions, Olivier Guedj who participated to the ﬁrst stages of this studyand Aurelien Vall´ee for help setting up the order book animation.

References [1] S. Skouras, J. D. Farmer,

The value of queue priority , mimeo (2013).[2] Z. Eisler, J.-P. Bouchaud, J. Kockelkoren,

The price impact of order bookevents: market orders, limit orders and cancellations , Quantitative Finance,DOI:10.1080/14697688.2010.528444 (2011); Z. Eisler, J.-P. Bouchaud, J. Kockelkoren,

Models for the impact of all order book events , Market Microstructure - Confronting ManyViewpoints, Edts.: F. Abergel et al., Wiley (2012).143] J.-P. Bouchaud, J. Kockelkoren, M. Potters,

Random walks, liquidity molasses and criticalresponse in ﬁnancial markets , Quantitative Finance, , 115 (2006).[4] P. Weber, B. Rosenow, Order Book Approach to Price Impact , Quantitative Finance 5, 357(2005).[5] B. T´oth, Z. Eisler, F. Lillo, J. Kockelkoren, J.P. Bouchaud, J. D. Farmer,

How does the marketreact to your order ﬂow? , Quantitative Finance 12, 1015 (2012).[6] E. Bacry, J.F Muzy,

Hawkes model for price and trades high-frequency dynamics ,arXiv:1301.1135[7] I. Rosu,

A dynamic model of the limit order book , Review of Financial Studies, 22, 4601 (2009).[8] P. L. Lions, J. M. Lasry, C. A. Lehalle, A. Lachapelle,

Structural modelling of orderbookdynamics: a Mean Field Game approach , in preparation (2013).[9] M. Avellaneda, S. Stoikov, J. Reed.

Forecasting prices from Level-I quotes in the presence ofhidden liquidity , Algorithmic Finance 1 35-43. (2011)[10] R. Cont, A. de Larrard,

Order book dynamics in liquid markets: limit theorems and diﬀusionapproximations , arXiv 1202.6412[11] A. Gar`eche,

A model for the queue dynamics of large tick stocks , Master Report, “Probabilit´eet Finance”, Paris 6 University (2013).[12] V. Alﬁ, G. Parisi, L. Pietronero,

Conference Registration: How people react to a deadline ,Nature Physics, 743, pp. 3 (2007); V. Alﬁ, A. Gabrielli, L. Pietronero,

How people react toa deadline: time distribution of conference registrations and fee payments , Central EuropeanJournal of Physics, 7, 483-489 (2009).[13] C. W. Gardiner,