[PDF] A Stochastic LQR Model for Child Order Placement in Algorithmic Trading

Abstract

Modern Algorithmic Trading ("Algo") allows institutional investors and traders to liquidate or establish big security positions in a fully automated or low-touch manner. Most existing academic or industrial Algos focus on how to "slice" a big parent order into smaller child orders over a given time horizon. Few models rigorously tackle the actual placement of these child orders. Instead, placement is mostly done with a combination of empirical signals and heuristic decision processes. A self-contained, realistic, and fully functional Child Order Placement (COP) model may never exist due to all the inherent complexities, e.g., fragmentation due to multiple venues, dynamics of limit order books, lit vs. dark liquidity, different trading sessions and rules. In this paper, we propose a reductionism COP model that focuses exclusively on the interplay between placing passive limit orders and sniping using aggressive takeout orders. The dynamic programming model assumes the form of a stochastic linear-quadratic regulator (LQR) and allows closed-form solutions under the backward Bellman equations. Explored in detail are model assumptions and general settings, the choice of state and control variables and the cost functions, and the derivation of the closed-form solutions.

Full PDF

aa r X i v : . [ q -f i n . T R ] A p r A Stochastic LQR Model for Child Order Placementin Algorithmic Trading ∗ Jackie Jianhong ShenFinancial ServicesNew York City, USAMarch 25, 2020

Abstract

Modern Algorithmic Trading (“Algo”) allows institutional investors and traders to liquidateor establish big security positions in a fully automated or low-touch manner. Most existingacademic or industrial Algos focus on how to “slice” a big parent order into smaller child ordersover a given time horizon. Few models rigorously tackle the actual placement of these childorders. Instead, placement is mostly done with a combination of empirical signals and heuristicdecision processes. A self-contained, realistic, and fully functional Child Order Placement (COP)model may never exist due to all the inherent complexities, e.g., fragmentation due to multiplevenues, dynamics of limit order books, lit vs. dark liquidity, diﬀerent trading sessions andrules. In this paper, we propose a reductionism COP model that focuses exclusively on theinterplay between placing passive limit orders and sniping using aggressive takeout orders. Thedynamic programming model assumes the form of a stochastic linear-quadratic regulator (LQR)and allows closed-form solutions under the backward Bellman equations. Explored in detail aremodel assumptions and general settings, the choice of state and control variables and the costfunctions, and the derivation of the closed-form solutions.

Keywords: child order placement, dynamic programming, LQR, delay cost, spread cost,impact cost, Poisson hits, passive, aggressive, Bellman equation, optimal policy, positive matrix.

Attention:

The current work is designed to be exclusively published in the SocialSciences Research Network (SSRN) or arXiv preprint server. Its commercial or open-journal publication is prohibited without the prior consent of the author. The current freestyle accommodates informative cover pages and colored text boxes that help enhance thereading experience. ∗ ontents V ( x ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114.2 Solution at the Last Period . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124.3 General Solution to the Stochastic LQR . . . . . . . . . . . . . . . . . . . . . . . . . 13 term meaning Algo an automated trading process based on algorithmsProﬁle an intraday time series derived from history, e.g., for volumesIS implementation shortfall - a popular optimization-based AlgoTWAP time-weighted average price - a standard proﬁle-based AlgoVWAP volume-weighted average price - a standard proﬁle-based AlgoCOP child order placementDP dynamic programmingLQR linear quadratic regulator with linear transitions & quadratic costs(N)BBO (national) best bid and oﬀer of a venue or market systemLOB limit order book, with buy/sell orders displayedSOR smart order router/routingFX foreign exchangesADV average daily volumePassive Touch best bid for buy and best oﬀer for sellAggressive Touch best oﬀer for buy and best bid for sellNear Touch same as Passive TouchFar Touch same as Aggressive TouchTakeout or Sniping aggressive market order at the far touchTMV true market valuePositive Matrix a symmetric real matrix with positive eigenvaluesThe following symbols have been consistently used in the paper. symbol meaning t n a discrete action time in dynamic programming X ± n the right/left limit of a quantity X at t n , via t n ± εHS half spread between the BBO M id mid price between the BBO N ( λ ) Poisson distribution with rate λq n outstanding positions right before t n ; a state variable λ n Poisson hitting rate right before t n ; a state variable u n aggressive market orders at t n ; the control variable η market impact of aggressive orders on passive ﬁlls γ n delay cost penalty at t n ; also expressing risk aversion x n state variable right before t n ; x n = ( q n , λ n ) V n ( x n ) the value function at t n in dynamic programming3 Introduction to Child Order Placement (COP)

Small retail orders can be ﬁlled by simple vanilla market or limit orders. There is no need forautomated algorithmic trading (or “Algos”) involving sophisticated strategies. Algos are primarilydesigned to execute sizable institutional orders from portfolio managers or traders in various fundsor broker-dealers. In the current work, the term “Algos” is restricted to the automated executionservice provided to either external or internal clients by the buy side, sell side, or specializedexecution agents.A typical single-name Algo is stratiﬁed to at least two distinguishable layers, which we shallrefer to as “macro” and “micro” layers or (sub-)Algos.(A) At the macro layer, a big parent order is sliced into smaller child orders over a series of timebuckets, e.g., 5-minute intervals, which usually depend on the liquidity proﬁles of a security.(B) The micro layer handles the execution of the resulted child orders. The actual implementa-tion and architecture could vary signiﬁcantly among broker-dealers and execution agents, andconstitute into the very proprietary core of all Algos.Most well-known Algos in the industry are named after the macro layers, e.g., TWAP, VWAP,or Implementation Shortfall (IS). These macro Algos are either conﬁgured based on historicalbenchmarks or optimized under proper utility objectives. The optimization techniques involveeither static or dynamic frameworks, as in these sample works [1, 3, 4, 6, 7, 9, 10, 11]. Overall, themacro Algos are built upon the macro behaviors of the targeted securities, including for instance,the historical proﬁles of volumes, volatilities, spreads, etc, as in our earlier works in [9, 10, 11].They do not act upon the real-time micro structure signals such as the dynamics of limit orderbooks (LOB).The complexities of these Algos mainly reside within the micro layers. The actual real-timeplacement of orders on various venues is implemented at this layer, and diﬀerent Algo providersmay take very diﬀerent approaches. For example, two oﬀerings of the same VWAP Algo coulddiﬀer signiﬁcantly in terms of architecture and logic.In general, a micro Algo must handle actions like the following:(a) a dynamic decision ﬂows for the placement actions and monitoring of their status,(b) allocation among diﬀerent order types oﬀered by all accessible venues, and(c) real-time routing to all accessible liquidity venues, including both lit and dark venues.In particular, the last component often assumes its own identity in most broker-dealer Algo oﬀerings,and is called the SOR - Smart Order Router. SORs are vital for some liquid asset classes withhighly fragmented markets, e.g., the common stocks in the USA.In terms of modeling techniques, macro Algos are either conﬁgured or scheduled using bench-mark proﬁles such as TWAP or VWAP, or optimized using proper utility functions (e.g., themean-variance framework of Almgren-Chriss [2]; also see Shen et al. [9, 10, 11]). Modeling of microAlgos is much more challenging due to the aforementioned multiple tasks. The main complexi-ties inherent to market microstructures include venues, sessions, order types, and their optimalreal-time management. SOR is part of this grand eﬀort and probably the most well known oractively marketed by Algo providers. But the SOR alone is only the last segment of the placementstream, and is actually irrelevant for single-venue securities such as commodities futures or someFX products.In this paper, we attempt to develop a COP model for a single-venue security. Hence SOR isout of the scope. Instead, the primary focus is on how to dynamically place and manage aggressivemarket orders and passive limit orders. The two order types compete in the following manner.4i) Aggressive market orders get ﬁlled fast and hence help accomplish order completion, but atthe cost of paying a half spread (with respect to the market mid price) and information leakagethat could harm trailing orders.(ii) Passive limit orders gain a half spread if ﬁlled, but are subject to ﬁll uncertainty and an extra“chasing” cost if the market drifts away. In general, they jeopardize order completion.A COP Algo in this context must optimally manage the interplay between the two order types, inorder to achieve the two primary objectives:(a) lower total cost for ﬁlled orders, and(b) a higher completion rate.The current work rolls out as follows. After making some reductionism assumptions, we ﬁrstbuild the COP model based on stochastic dynamic programming (SDP). The solution to the re-sultant stochastic linear-quadratic regulator (LQR) problem is then worked out via the associatedBellman equations. Theoretical assumptions or practical implications are always discussed in detailalong the way. The main result is summarized into Theorem 1 in Section 4.We believe this is the ﬁrst self-contained and mathematically rigorous COP model that has aclosed-form solution.

Throughout the rest, in order to gain a more tangible sense of all the model settings, we assume aconcrete working parent order with the following attributes.(i) It is a “buy” order for a common stock, say.(ii) The total quantity is Q = 100 ,

000 shares.(iii) The average daily volume (ADV) is 2 , ,

000 shares, based on a monthly rolling window.(iv) The client prefers the order to be completed during the horizon from 12:00 pm to 3:00 pmEST in the US equity market (which is however consolidated into a single-venue market toavoid SOR).Of course none of these example order details actually puts restrictions on the proposed COPmodel. For example, the model is applicable to other liquid asset classes such as futures and rates.At the macro level, Algos such as TWAP, VWAP and IS “slice” the parent order into smallerchild orders over a series of time buckets. For illustration, assume that such an Algo works with 5-minute time buckets. Then the target 3-hour execution horizon requested by the client is split into36 time buckets. Further assume that this macro Algo decides to allocate 2 ,

500 shares or 25 lotsto the speciﬁc time bucket [1:00pm, 1:05pm]. In practice these time knots can all be randomizedfor anti-gaming.If the time buckets of the macro Algo are relatively big, e.g., 5 minutes, a micro-layer schedulercan be further designed to schedule the allocated 2 ,

500 shares, say, over ﬁner micro bins. Atthis layer, sophisticated optimization may be spared in order to save time. In general, an equalpartition, i.e., allocation based on TWAP, can be applied. Take the above working example forinstance. The 25 lots allocated for the macro time bucket [1:00pm, 1:05pm] can be further split5o 5 lots over each micro bins of 1 minute, i.e., [1:00pm, 1:01pm], [1:01pm, 1:02pm], etc. Againrandomization of time knots should also be applied.The introduction of ﬁner micro time bins is necessary for most scenarios. This is because themacro time buckets must last long enough so that bucket signals are statistically meaningful. Takethe VWAP or IS Algo for example. These macro Algos all depend on the temporal proﬁles ofsecurity volumes, volatilities, or spreads (e.g., Shen et al. [9, 10, 11]). When the macro buckets aretoo brief, the proﬁle values based on historical averaging or ﬁltering would be too noisy and resultin unreliable Algo scheduling at the macro layer. In general, bucket size should be optimized oradapted to the liquidity proﬁles of a given security.The current work takes a reductionism approach and attempts to develop a self-containedrigorous COP model at the micro bin level. That is, the model handles the actual executionof individual child (or grandchild) orders over micro bins, e.g., 5 lots over [1:00pm, 1:01pm] forthe above running example. It is a fully automated model based on the framework of stochasticlinear-quadratic regulators (LQR) and allows closed-form solutions.

In our previous two works on macro Algos: • Shen [9] on a generic pre-trade macro Algo based on static quadratic programming in Hilbertspaces, and • Shen [10] on a real-time adaptive macro Algo based on dynamic programming that integratesthe VWAP and IS Algos,the proposed Algo models can actually be implemented in execution houses after proper modelcalibration is performed using their proprietary trading data.Hence it is important to point out at the outset that the current micro Algo is more a theoreticalmodel in the spirit of reductionism. As explained earlier, it is almost impossible to have a singleself-contained model to comprehensively handle the entire micro-layer execution. For instance, itis nontrivial to handle unexpected intraday trading halts or participation in various auctions. Themain reductionism of the proposed model involves the following aspects.(a) It is restricted to single-venue executions and does not involve SOR modeling.(b) It deals with neither lit vs. dark venues nor complex order types (e.g., icebergs or mid-pegging).(c) It only handles the two most basic order types - limit and market orders.Notice that in professional trading one almost never sends a “naked” market order. Hence bymarket orders we mean more precisely marketable limit orders whose limits cross the far touch.Despite the reductionism, the value of the current work can be summarized as follows. • To the academic community, to our best knowledge this is the ﬁrst rigorous COP model atthe micro layer, which is self-contained and allows closed-form solutions. It opens the doorto more sophisticated or realistic COP models in the future. • To the Algo practitioners on Wall Street, the model does reveal the intriguing dependencyand competition among diﬀerent key Algo components: aggressive sniping, passive waiting,execution cost, information leakage, market impact, and the requirement of completion. Themodeling techniques here can always be tweaked to facilitate existing COP processes.6 .3 The Placement Problem to be Modelled

Recall earlier in this section, after macro bucketing and micro binning, one ends up with a COPproblem like the following:to buy 5 lots over a micro bin [1:00pm, 1:01pm],or more generally, to buy q lots over a micro bin [ T , T ] with a brief duration ∆ T = T − T of 30or 60 seconds or so.The COP problem modelled herein is formulated as follows. A given micro bin [ T , T ] ispartitioned into N action times: t = T < t < . . . < t N − < t N = T . In practice, one could choose periodic knots, say, τ = 5 or 10 seconds, and t n +1 = t n + τ, n = 0 , , . . . , N − . Actual implementation could also have them randomized for anti-gaming. Since the micro binduration ∆ T = T − T = t N − t is brief, it is also assumed that the BBO, i.e., the best bidand oﬀer, remain unchanged over the given micro bin. (This is introduced merely for convenience.In reality, the BBO can change realistically in the current model as long as one assumes that thebenchmark market price is the moving mid price and that the limit order placement is cancelledand replaced whenever the BBO move so that it is eﬀectively pegged to the BBO.)Following the running example introduced earlier, we shall always work with a buy order - tobuy q lots over a micro bin [ T , T ]. For a buy order, we shall introduce the following concepts: • the passive or near touch - the best bid of the venue, and • the aggressive or far touch - the best oﬀer of the venue.For a sell order the other way around holds. Furthermore, let q n denote the remaining lots rightbefore t ∈ [ t , t N ]. To simplify notation, we introduce the following convention: for any observable,variable or parameter X , X n := X − n = lim ε → + X t − ε , and X + n = lim ε → + X t + ε . For a continuous X there is no diﬀerence among the three. For the proposed COP model, a snipingaction (or a control in the context of dynamic programming) will take place right at a given actiontime t n , and hence they could diﬀer.Attention - We have defaulted X n to X − n to simplify notation and equation lines.The proposed COP model adopts the following action plan.(i) At any time t , as long as q n >

0, a single-lot limit order will be placed at the passive touch.(ii) Whenever such a limit order is ﬁlled at t and the remainder q + t = q t − t n from t , . . . , t n , . . . , t N − , if q n (which represents q − n ) still has some positive lots to trade, the COP Algo has an option tosend a single-lot market order at the far touch. We shall nickname this action by “single-lotsniping” or simply “sniping.” The term “Sniper” or “Sniping” has been popularly used inthe Algo world, e.g., the Sniper Algo of Credit Suisse in this 2007 article of Reuters (with anactive URL link in PDF). 7he major three characteristics of this target COP problem are: order completion, passivewaiting, and aggressive sniping, which are elaborated as follows. The Constraint on Completion

Order completion is usually enforced at the macro layer. Itis either explicitly formulated into optimization as a constraint (e.g., for the IS Algo) or enforcedthrough hard scheduling (e.g., for TWAP/VWAP Algos). For micro-layer execution, e.g., executing5 lots within a micro bin of 30 or 60 seconds, completion can be soft or delayed, in order to betterdance with the liquidity waves in the market. The unﬁlled can be handed over to the next microbin, and so on so forth. The macro Algo on the top usually deploys a dedicated schedule “keeper”to enforce the schedules.

Passive Limit Order at the Near Touch

Recall that for convenience, the BBO have beenassumed invariant over a brief micro bin [ T , T ]. Let HS = 12 ( BestOf f erP rice − BestBidP rice )denote the half spread, and

M id = 12 (

BestOf f erP rice + BestBidP rice )the unbiased mid price that represents the true market value (TMV) at the moment. Comparedwith other more sophisticated averaging schemes, this is usually called the simple mid. Once ﬁlled,a limit order saves a half spread HS compared with the TMV. However, if left unﬁlled by the targetend time, limit orders can jeopardize order completion. A reasonable COP model must be able toreﬂect this tradeoﬀ. Aggressive Sniping at the Far Touch

Market or marketable orders take out liquidity fromthe top of the opposite LOB, i.e., the far touch. At the micro layer, market orders are alwayskept small, e.g., a couple of lots on average for major exchanges in US. Hence the proposed modelalways assumes that such small orders are ﬁlled instantaneously. Market orders ﬁll fast and helpachieve completion, but at the cost of a half spread HS . Furthermore, aggressive market orderscould also leak information and result in opposite market participants or market makers biasingtheir perceived TMV towards the aggressive touch. As a result, they become less willing to takeout “our” limit orders posted at the passive touch. A reasonable COP model must be able todemonstrate such tradeoﬀs as well. We now introduce the basic assumptions for the model and process.

Assumptions on Limit Order Placement

For the placement of limit orders, the followingbasic assumptions are made.(A) The size of the limit order is always a single lot.(B) A single-lot limit order is placed initially at t .8C) Afterwards, whenever the limit order is taken out by an opposite market order at time t , a newsingle-lot limit order is immediately replenished as long as the remaining position q + n = q n − t , as long as “we” do not snipe at the aggressive touchwithin the interval, the number W of single-lot limit orders being hit is subject to a Poissondistribution N ( λ ∆ t ) with some rate λ . That isProb( W = n ) = e − λ ∆ t ( λ ∆ t ) n n ! , n = 0 , , . . . . (1)It is also assumed that once being hit the entire lot is taken. Poisson distribution or process is notunfamiliar in the context of Algo trading and limit order books [8]. Assumptions on the Aggressive Market Orders

For sniping using aggressive market orders,e.g., sending a marketable order u n at time t n at the far touch, the following assumptions are made.(a) The ideal goal is to set u n to either a single lot or zero (i.e., no sniping), so that informationleakage and market impact can be curbed. In reality, to facilitate a tractable dynamic pro-gramming (DP) formulation with closed-form solutions, the single lot constraint is not explicitlyimposed. The cost function designed later will naturally encourage u n to stay small.(b) It is also assumed that once an aggressive market order u n is sniped, the entire order will beﬁlled. Since the cost function in general keeps u n small, this assumption holds naturally formost liquid securities.(c) Since during the short duration of a micro bin, the BBO are assumed to be invariant, we adoptthe following model for information leakage caused by an aggressive sniping at time t .Once u n lots are taken out (by “us”) at the far touch, other market participants, includingin particular market makers, will update their belief on the TMV and bias it towards the fartouch. As a result,either fewer opposite participants are willing to snipe at the passive touch or market makerswill also cancel their more passive inside limit orders and replace with new ones at thepassive touch.The latter will congest the queue at the passive touch. Hence heuristically both will reduce thechance of “our” limit orders being hit by the market.Quantitatively, we assume that the Poisson hitting rate introduced in Eqn. (1) will be negativelyimpacted by the following linear form: λ + n = λ n − ηu n , (2)after u n lots are sniped and ﬁlled at time t n . Here η > λ = 5 . η = 0 . u n = 2 lots at some time t n will reduce the Poisson hitting rate according to: λ + n = λ n − ηu n = 5 . − . ∗ . . The impact parameter η can be calibrated or estimated using experimental orders that aredesigned speciﬁcally for this purpose. 9 .2 State and Control Variables, and State Transition To develop the dynamic programming (DP) COP model, we ﬁrst deﬁne the state variables andtheir transition.There are two state variables, q n and λ n , or organized into a state vector x n = ( q n , λ n ) T , wherethe superscript T denotes transposition of vectors or matrices. • State variable q n denotes the outstanding lots still needed to be traded right before t n . Thesymbol is equivalent to q − n but with the minus superscript omitted, as set up earlier. • State variable λ n denotes the Poisson hitting rate right before t n . It represents the rate theopposite aggressive orders hit “our” limit orders. It changes whenever “we” snipe a marketorder u n at the far touch, due to information leakage and its digestion by other marketparticipants.The control variable is the aggressive takeout order u n that “we” snipe at t n at the far touch.Furthermore, let W n denote the Poisson random number of “our” limit orders being hit by theopposite aggressive orders. Then from t n to t n +1 (or more precisely t − n +1 ), the following statetransition equations hold. q n +1 = q n − u n − W n , (3) λ n +1 = λ n − ηu n . (4)In the vector form, deﬁne a = (1 , η ) T , z n = ( W n , T . Then the state vector transits as follows: x n +1 = x n − a u n − z n . (5) All variables are assumed to be continuous, as normally done in the Algo literature. (In realitytrades are mostly in whole shares or lots.)Following all the previous preparation, the COP problem is formulated as a DP problem withcontrols taken at one of the following action times: t n ∈ { t , t , . . . , t N − } . No action is taken at the terminal time knot t N . At each t n , we ﬁrst deﬁne the initial candidate j (1) n for the stage cost, which is still subject to revision later on: j (1) n ( u n , W n | x n ) = γ n q n + u n − W n . (6)It is explained as follows.(a) The ﬁrst term γ n q n favors fast execution so that q n ’s quickly touch down to zero. It appearsin earlier works for both static and dynamic macro Algos, e.g., Algren-Chriss [2], Hora [6], andShen [9, 10], just to name a few. The penalty coeﬃcient γ n ’s are the control parameters topenalize trading delays. In general, γ n can be set in proportion to the real-time variance σ n of the secuity, i.e., in the form of γ n = ˜ γ n σ n . In addition, γ n should increase monotonically tofacilitate order completion. For instance, γ N = + ∞ would enforce hard completion: q N = 0.10b) W n is the number of single-lot limit orders being taken out by opposite market orders at “our”passive touch. Limit orders save a half-spread HS . We incorporate the scaling of HS into γ n so that − HS · W n can be simpliﬁed to − W n . Following the general Poisson setting in Eqn. (1),we assume more speciﬁcally that W n is subject to the Poisson distribution N ( λ + n ∆ t n ) with λ + n = λ n − ηu n , and ∆ t n = t n +1 − t n . (c) u n is the number of lots that “we” snipe at the aggressive touch at the action time t n . It paysthe cost of a half spread, i.e., HS · u n . Since HS is scaled into γ n , it is simply expressed as u n in the stage cost.For a buy order, u n is preferably nonnegative. In order to facilitate a close-form solution,however, we do not explicitly impose the constraint of u n ≥

0. When u n <

0, we shall interpretit as an aggressive sell order of size − u n at the current passive touch. If this is the case, theupdate position will increase: q + n = q n − u n > q n . The ﬁrst term γ n q n will discourage such opposite trades as long as the risk aversion weights γ n ’s are not negligible.On the other hand, compared with the market mid price, selling at the passive touch also incursa cost of a half spread, i.e, HS · ( − u n ). Hence when u n is allowed to be signed, the stage costin Eqn. (6) should at least be revised to: j (2) n ( u n , W n | x n ) = γ n q n + | u n | − W n . Since we intend to design a DP model with a closed-form solution, the absolute value is furtherrevised to a squared form: j n ( u n , W n | x n ) = γ n q n + u n − W n , (7)which is the ﬁnal stage cost adopted for the current model.When u n stays close to a single lot, u n ≃ | u n | . For | u n | >

1, this quadratic form penalizes bigsizes even heavier than the linear form. It favors smaller aggressive order sizes as a result.Inspired by the Γ-convergence theory and its application in multi-phase variational problems [5],one could also introduce the double-well cost function: u n (1 − u n ) ε , with 0 < ε ≪ . This will softly enforce the binary sniping behavior - either no action with u n = 0 or snipingwith a single lot u n = 1. However, such high order non-convex costs can completely thwart theeﬀort of designing a DP model with a unique and close-form solution. Summary.

The stage cost model in Eqn. (7) and the state transition model in Eqn. (5) deﬁnethe proposed stochastic dynamic programming model for child order placement (COP). V ( x ) At any “current” action time t n with state variable x n = ( q n , λ n ) T , for any choice of policy or actionsequence u n = ( u n , . . . , u N − ) at ( t n , . . . , t N − ), let W n = ( W n , . . . , W N − ) denote the resulting11oisson hits on “our” passive limit orders during individual intervals [ t k , t k +1 )’s. Then the futurecost is given by J n ( u n , W n | x n ) = J n ( u n , . . . , u N − ; W n , . . . , W N − | x n )= j n ( u n , W n | x n ) + . . . + j N − ( u N − , W N − | x N − ) + j N ( x N )= j n ( u n , W n | x n ) + J n +1 ( u n +1 , W n +1 | x n +1 ) (8)Here j N ( x N ) denotes the terminal cost at t N . Deﬁne the value function by: V n ( x n ) = inf u n E W n [ J n ( u n , W n | x n )] , (9)ranging over all state-driven policies in the form of u k = φ k ( x k ), k = n, . . . , N −

1. Then we havethe Bellman equation at each action time t n : V n ( x n ) = inf u n E W n [ j n ( u n , W n | x n )] + E W n [ V n +1 ( x n +1 )] . (10)The terminal cost at t N is deﬁned to be V N ( x N ) = j N ( x N ) = γ N q N . (11)The other two terms are dropped out since the end of the given micro bin is reached.Next, we shall work out ﬁrst the solution for the last period [ t N − , t N ) to gain some tangibleknowledge, and then the general solution in the framework of stochastic LQR. At action time t N − for the last period [ t N − , t N ), the Bellman equation reads: V N − ( x N − ) = inf u N − γ N − q N − + u N − − λ + N − ∆ t N − + γ N E W N − (cid:2) ( q N − − u N − − W N − ) (cid:3) . For clarity in calculation, we drop the subscript N − V ( x ) = inf u f ( u | x ) = inf u γq + u − λ + ∆ t + γ N E W ( q − u − W ) , with the current state vector x = ( q, λ ) T which is known, and λ + = λ − ηu . The problem becomesthe minimization of a single-variate function f ( u | x ) given x .Since E [ W ] = λ + ∆ t , and E[ X ] = Var( X ) + E[ X ] for a generic random variable X , one hasE( q − u − W ) = Var( q − u − W ) + ( q − u − λ + ∆ t ) = Var( W ) + ( q − u − λ + ∆ t ) = λ + ∆ t + ( q − u − λ + ∆ t ) . As a result, the derivative of f is: dfdu = 2 u + ( γ N − t dλ + du − γ N ( q − u − λ + ∆ t )(1 + ∆ t dλ + du )= 2(1 + γ N (1 − η ∆ t ) ) u − η ∆ t ( γ N − − γ N (1 − η ∆ t )( q − λ ∆ t ) . At the optimal u ∗ , the derivative vanishes. Hence the optimal aggressive trading is given by: u ∗ = α ∗ + β ∗ ( q − λ ∆ t ) , with α ∗ = ( γ N − η ∆ t γ N (1 − η ∆ t ) ) , and β ∗ = γ N (1 − η ∆ t )1 + γ N (1 − η ∆ t ) (12)12n particular, the optimal policy u ∗ = φ ∗ ( x ) = φ ∗ ( q, λ ) is a linear function of the state variable.Consider the asymptotic case when γ N = + ∞ . Then one has α ∗ = η ∆ t − η ∆ t ) , and β ∗ = 11 − η ∆ t . In particular, for a highly liquid security so that η ≃

0, the optimal policy is simply u ∗ = α ∗ + β ∗ ( q − λ ∆ t ) ≃ q − λ ∆ t. That is, on the last action time t N − , the proposed COP Algo will trade the expected remaininglots that cannot be ﬁlled by the passive limit orders W (since E[ W ] = λ + ∆ t ≃ λ ∆ t when η ≃ t N − , one then has V ( x ) = f ( u ∗ | x )= γq + u ∗ + ( γ N − λ ∆ t − η ∆ tu ∗ ) + γ N ( q − λ ∆ t − (1 − η ∆ t ) u ∗ ) = γq + c ( q − λ ∆ t ) + c ( q − λ ∆ t ) + c , (13)where c = β ∗ + γ N (1 − (1 − η ∆ t ) β ∗ ) = γ N γ N (1 − η ∆ t ) > . Hence the value function for the last period can be written in the canonical quadratic form as: V ( x ) = x T P x + b T x + c, (14)where P must be a positive deﬁnite matrix. This is because by the quadratic portion of V ( x ), γq + c ( q − λ ∆ t ) = 0 ⇒ q = 0 , λ = 0 . Next we show that this is not accidental.

In general, for n = 0 , , . . . , N −

2, assume that the value function at n + 1 is in the quadratic form: V n +1 ( x n +1 ) = x Tn +1 P n +1 x n +1 + b Tn +1 x n +1 + c n +1 , (15)where P n +1 is positive deﬁnite. We now show that this implies that V n ( x n ) = x Tn P n x n + b Tn x n + c n , where P n is also positive deﬁnite.Furthermore, we show that the optimal action is given in the linear form: u n = φ n ( x n ) = α n + β Tn x n . The objective is to derive α n , β n , P n , b n and c n from P n +1 , b n +1 and c n +1 recursively.For clarity, we drop the subscript n so that for any variable or parameter X , we use instead X n −→ X, and X n +1 −→ X .

13n particular, the value function at n + 1 now assumes the cleaner form: V ( x ) = x T P x + b T x + c . (16)Also the state transition equation becomes: x = x − a u − z , with a = (1 , η ) T , z = ( W, T , where η is a model parameter or constant that represents the market impact or information leakageand W is the Poisson random hits on “our” passive limit orders over [ t n , t n +1 ).Then the value function at t n is given by: V ( x ) = min u f ( u | x ) = min u γq + u − λ + ∆ t + E W V ( x ) . Let p denote the (1,1)-element of P , and z E = ( λ + ∆ t, T = E[ z ]. ThenE W V ( x ) = V (E W x ) + E W ( z − z E ) T P ( z − z E )= V ( x − a u − z E ) + p Var( W )= V ( x − a u − z E ) + p λ + ∆ t. We now Deﬁne L = (cid:18) t (cid:19) , J = I − L, and h = (cid:18) − η ∆ tη (cid:19) , (17)where I denote the 2 by 2 identity matrix. Then z E = (cid:18) λ ∆ t − η ∆ tu (cid:19) = L x − (cid:18) η ∆ t (cid:19) u, x − a u − z E = J x − h u. Hence we have, with l = 1 − p , f ( u ) = f ( u | x ) = γq + u − l λ + ∆ t + V ( J x − h u ) . (18)Assume that f ( u ) = f (0) − Bu + Au . Then , f ′ ( u ) = 2 Au − B. (19)On the other hand, direct diﬀerentiation gives f ′ ( u ) = 2 u + l η ∆ t − h T ∇ V ( J x − h u )= 2(1 + h T P h ) u − ( h T b − l η ∆ t ) − h T P J x . Hence we have A = 1 + h T P h , and B = ( h T b − l η ∆ t ) + 2 h T P J x . (20)Therefore, the optimal policy at t n − is given by u ∗ = B A = α ∗ + β T ∗ x , with α ∗ = h T b − l η ∆ t h T P h ) , and β T ∗ = h T P J h T P h (21)14ext we derive the associated value function V ( x ) = f ( u ∗ | x ). For convenience, for anypositive deﬁnite matrix Q and any real vector x of the same dimension, we deﬁne the Q-stretchedEuclidean norm by: k x k Q = x T Q x . Then the coeﬃcient A is simply 1 + k h k P .Since 2 Au ∗ = B , one has 2 Au ∗ = Bu ∗ . Hence by Eqn. (18), V ( x ) = f ( u ∗ | x )= f (0) − Bu ∗ + Au ∗ = f (0) − Au ∗ = γq − l λ ∆ t + V ( J x ) − A ( α ∗ + β T ∗ x ) = γq − l λ ∆ t + ( x T J T P J x + b T J x + c ) − (1 + k h k P )( α ∗ + β T ∗ x ) = γq + x T ( J T P J ) x − (1 + k h k P ) x T β ∗ β T ∗ x − l λ ∆ t + b T J x − k h k P ) α ∗ β T ∗ x + c − (1 + k h k P ) α ∗ = x T P x + b T x + c, where the quadratic parameters are given by P = γ (cid:18) (cid:19) + J T QJ, with Q = P − P hh T P k h k P , b T = b T J − l ∆ t (0 , − k h k P ) α ∗ β T ∗ ,c = c − (1 + k h k P ) α ∗ . (22)where l = 1 − P (1 ,

1) and α ∗ , β ∗ are given as in Eqn. (21).We now show that P is positive deﬁnite. Lemma 1 If P is positive deﬁnite, so must be P .Proof . Since J = I − L is non-singular as in Eqn (17), it suﬃces to show that Q is positivedeﬁnite.For any non-zero vector v ∈ R , previously we have used the notation k v k P to denote the P -stretched Euclidean distance. More generally, we use (cid:10) v , h (cid:11) P := v T P h to denote the P -stretched inner product. By the Cauchy-Schwarz Theorem, one has (cid:10) v , h (cid:11) P ≤ k v k P · k h k P . Q , one has v T Q v = v T P − P hh T P k h k P ! v = k v k P − (cid:10) v , h (cid:11) P k h k P = k v k P + k v k P k h k P − (cid:10) v , h (cid:11) P k h k P ≥ k v k P k h k P > . Since this holds for any non-zero vector v , Q and hence P must be positive deﬁnite. (cid:4) We have thus established the following theorem, with J n and h n deﬁned as in Eqn. (17): J n = (cid:18) − ∆ t n (cid:19) , and h n = (cid:18) − η ∆ t n η (cid:19) . They are constant for equal partitioning when ∆ t n = t n +1 − t n ’s are all the same. Theorem 1

Let V N ( x N ) = γ N q N be the terminal cost at the ending time t N . Then at eachaction time t n with n < N , there exist a positive deﬁnite 2 by 2 matrix P n , a 2 by 1 vector b n ,a scalar c n , such that the value function V n is given by: V n ( x n ) = x Tn P n x + b Tn x n + c n = ( q n , λ n ) P n (cid:18) q n λ n (cid:19) + b Tn (cid:18) q n λ n (cid:19) + c n . (23) The optimal policy u n is given by the linear form using parameters at t n +1 : u ∗ n = φ n ( x n ) = α ∗ n + x Tn β ∗ n , with α ∗ n = b Tn +1 h n − (1 − P n +1 (1 , η ∆ t n h Tn P n +1 h n ) , and β ∗ n = J Tn P n +1 h n h Tn P n +1 h n , (24) where P n +1 (1 , denotes the (1,1)-element of P n +1 . Furthermore, the structure of the valuefunctions also cascades backwards as follows: P n = γ n (cid:18) (cid:19) + J Tn (cid:18) P n +1 − P n +1 h n h Tn P n +1 h Tn P n +1 h n (cid:19) J n , b n = J Tn b n +1 − (1 − P n +1 (1 , t n (cid:18) (cid:19) − h Tn P n +1 h n ) α ∗ n β ∗ n ,c n = c n +1 − (1 + h Tn P n +1 h n )( α ∗ n ) , (25) with n = N − , . . . , , , and terminal values P N = diag( γ N , , b N = and c N = 0 . Conclusion and Disclaimers

We conclude the current work with the following comments and disclaimers.(1) The proposed dynamic programming COP Algo had not been the internal or external productof any execution houses where the author worked previously. Any potential industrial conﬂictor suspected proprietary trespass should be promptly directed to the attention of the author,together with necessary evidences.(2) In the spirit of reductionism and the pursuit of a dynamic programming COP model withclosed-form solutions, the current model does not address other important execution or imple-mentation details, including fragmented venues in the national market system (NMS), diﬀerenttrading sessions and rules, various order types, lit vs. dark, and so on.(3) The current work focuses exclusively on the dynamic interplay between aggressive takeoutorders and passive limit orders. The price improvement or cost is represented by a half spread.Information leakage or the market impact of aggressive orders is reﬂected in the reduction ofPoisson hitting rates on passive limit orders.(4) Like some earlier DP macro Algos, risk aversion is implemented by the delay cost in the stagecost model. It facilitates soft completion of a given child order over its designated micro timebin. Hard completion or catchup is usually implemented at the macro layer.(5) If the results here are to be integrated into an existing COP program in an execution house, apractitioner should apply some heuristic but necessary overlays. For instance, u ∗ n ≤ u ∗ n > q n .(6) Overall, the author wishes that the current model could inspire more similar and rigorousworks that can improve the heuristic decision trees prevailing in the COP processes in thecontemporary Algo industry. Acknowledgments

Jackie Shen is very grateful to all the colleagues at the electronic or algorithmic trading desksof J.P. Morgan, Barclays and Goldman Sachs, esp. to many of our hard-working IT, RISK, andCOMPLIANCE colleagues whose names hardly appear in the headlines, for their daily professionalassistance as well as generous personal support. Reliable and healthy electronic trading would beimpossible without solid IT implementations of databases, data streaming, servers, networks, andmultiple inter-dependent Algo components, or eﬀective risk management and compliance controls.This work was completed when the pandemic Covid-19 was sweeping through the entire globemercilessly. Under tremendous mental pressure living in the epicenter of New York, the author isextremely grateful to his family, friends, and colleagues, as well as thousands of courageous andselﬂess medical professionals, policemen and policewomen, and ﬁre ﬁghters of this great city.The pandemic has actually brought the people in the city and around the globe much closerand more united, as my 9-year old observes from her numerous Zoom online classes and chats,as well as all the touching stories around the world on ﬁghting against the virus. Beyond A.I. orautomated trading “robots” as the current work has covered, the pandemic has grounded all of usto the very core meaning of human beings and human societies. At the end of this darkest stormthere will be a brightest rainbow — so colorful, refreshing, and full of new hopes.17 eferences [1] R. Almgren. Optimal trading with stochastic liquidity and volatility.

SIAM J. FinancialMath. , 3:163–181, 2012.[2] R. Almgren and N. Chriss. Optimal execution of portfolio transactions.

J. Risk , 3:5–39, 2000.[3] D. Bertsimas and A. W. Lo. Optimal control of execution costs.

J. Financial Markets , 1(1):1–50, 1998.[4] B. Bouchard, N.-M. Dang, and C.-A. Lehalle. Optimal control of trading algorithms: a generalimpulse control approach.

SIAM J. Finan. Math. , 2(1):404–438, 2011.[5] T. F. Chan and J. Shen.

Image Processing and Analysis: variational, PDE, wavelet, andstochastic methods . SIAM Publisher, Philadelphia, 2005.[6] Merell Hora. Tactical liquidity trading and intraday volume.

Preprint , pages 1–28, 2006.[7] G. Huberman and W. Stanzl. Optimal liquidity trading.

Yale School of Management WorkingPapers , YSM 165, 2001.[8] A. Obizhaeva and J. Wang. Optimal trading strategy and supply/demand dynamics.

J.Financial Markets , 16(1):1–32, 2013.[9] J. Shen. A pre-trade algorithmic trading model under given volume measures and generic pricedynamics.

Applied Math. Res. eXpress,

Oxford Univ. Press, 2015(1):64–98, 2015.[10] J. Shen. Hybrid IS-VWAP dynamic algorithmic trading via LQR.

Social Sci. Res. Network(SSRN) Preprint , (2984297):1–23, 2017.[11] J. Shen and Y. Yu. Styled dynamic algorithmic trading and the MV-MVP style.