An overall view of key problems in algorithmic trading and recent progress
aa r X i v : . [ q -f i n . T R ] J un An overall view of key problems in algorithmic tradingand recent progress
Micha¨el Karpe ∗ June 9, 2020
Abstract
We summarize the fundamental issues at stake in algorithmic trading, and the progressmade in this field over the last twenty years. We first present the key problems of algorithmictrading, describing the concepts of optimal execution, optimal placement, and price impact.We then discuss the most recent advances in algorithmic trading through the use of MachineLearning, discussing the use of Deep Learning, Reinforcement Learning, and Generative Ad-versarial Networks.
Algorithmic trading is a form of automated trading with the use of electronic platformsfor the entry of stock orders, letting an algorithm decide the different aspects of the order,such as the opening or closing time, the price or the volume of the order, most times withoutthe slightest human intervention. Since algorithmic trading is used for order placement andexecution, increasingly intelligent and complex algorithms are competing to optimize theplacement and execution of these orders in a market that is becoming better and betterunderstood by the developers of these algorithms.
The study of algorithmic trading first requires an understanding of its fundamental is-sues, namely what is the best quantity for an order to be executed (optimal execution), how ∗ Department of Industrial Engineering and Operations Research, University of California, Berkeley.Email: [email protected]
1o place orders in a time range (optimal placement), and the consequences of these orders onthe price of the stock on which we place an order (price impact). In this section, we explorethese three fundamental issues using key references that address these problems.
Optimal execution is the most known problem in algorithmic trading and was addressedby Bertsimas and Lo [1] in the case of a discrete random walk model and by Almgren andChriss [2] in the case of a Brownian motion model. Optimal execution consists of buyingor selling a large amount of stock in a short period, which then has an impact on the priceof the stock. Such execution is optimal in the sense that one seeks to minimize the priceimpact or execution costs or to maximize the expectation of a predefined utility function.
In Bertsimas and Lo [1], optimal execution is presented as an execution cost minimizationproblem, under the constraint of acquiring a quantity S of shares over the entire period oflength T . Mathematically, by defining for each instant t ∈ { , , . . . , T } , S t the number ofshares acquired in period t at price P t , this optimal execution problem is written:min S t , t ∈{ , ,...,T } E T X t =1 P t S t ! s.t. T X t =1 S t = S The simplest evolution of the price P t proposed by Bertsimas and Lo [1] is to define P t as the sum of the previous price P t − , a linear price impact term θS t ( θ >
0) depending onthe number of shares S t acquired at the time t , and a white noise ε t ( ε t ∼ WN(0 , σ )): P t = P t − + θS t + ε t with E [ ε t | P t − , S t ] = 0In their paper, Bertsimas and Lo [1] show that for such an evolution of the price P t , thesolution of the optimal execution problem is obtained recursively through dynamic program-ming and is S ∗ = S ∗ = · · · = S ∗ T = S/T .Then, they deal with the case of the linear price impact with information, by consider-ing an additional term γX t in the evolution of the price, such that X t = ρX t − + η t with ρ ∈ ( − ,
1) and η t = WN(0 , σ ), as well as with the general case where P t = f t ( P t − , X t , S t , ε t )and X t = g t ( X t − , η t ). In both previous cases, the solution to the optimal execution problemcan still be obtained recursively through dynamic programming, resulting in a more complexformulation of the optimal execution strategy.2 .1.2 Optimal execution in Almgren and Chriss In Almgren and Chriss [2], the optimal execution problem is presented as the minimiza-tion of a utility function U being the sum of the expectation E and linear term of the variance V of the implementation shortfall: U ( x ) = E ( x ) + λV ( x )Before defining further the expectation and the variance of the implementation shortfall,we need to define the evolution of the price and the price impact. In Almgren and Chriss[2], we define X the number of shares to liquidate before time T , t k = kτ with τ = T /N adiscretization of the time interval [0 , T ] in N intervals, x k the number of remaining sharesat time k and n k = x k − − x k . The evolution of the price P k is defined as the sum of theprevious price P k − , a linear term of a white noise ξ t ( ξ t ∼ WN(0 , σ )) and a permanentlinear price impact depending on n k : P k = P k − + στ / ξ k − τ g (cid:16) n k τ (cid:17) where σ is the volatility of the asset and g ( v ) = γv .Almgren and Chriss [2] also consider a temporary price impact, however, this temporaryprice impact only influences the price per share ˜ P k received and not on the actual price P k :˜ P k = P k − − h (cid:16) n k τ (cid:17) where h ( n k /τ ) = ε sgn( n k ) + ηn k /τ , with ”a reasonable estimate for ε being the fixed costsof selling” [2] and η depending on the market microstructure.The framework of the Almgren and Chriss [2] optimal execution being defined, we cannow define the expectation and the variance of the implementation shortfall: E ( x ) = N X k =1 τ x k g (cid:16) n k τ (cid:17) + N X k =1 n k h (cid:16) n k τ (cid:17) V ( x ) = σ N X k =1 τ x k In this framework, optimal execution strategies can also be computed explicitly and are il-lustrated in the form of an efficient frontier in the variance-expectation two-dimension space.Almgren and Chriss [2] also stress the importance of considering the risk/reward tradeoff inthe calculation of optimal execution strategies, through the use of a λ risk-aversion param-eter, to create optimal execution strategies adapted to the risk profile of the executor.3 .1.3 Further work on optimal execution Bertsimas and Lo [1] & Almgren and Chriss [2] models support most of the work doneon optimal execution since 2000.On the one hand, recent work uses Bertsimas and Lo [1] model to show there is no sig-nificant improvement in moving from static optimal execution strategies to adapted ones forthe benchmark models studied [3].On the other hand, while Almgren and Chriss [2] deal with the problem of optimal execu-tion under price uncertainty, recent work uses this model to consider the problem of optimalexecution under volume uncertainty, if the volume of shares that can be executed is notknown in advance [4]. They show ”risk-averse trader has benefit in delaying their trades” [4] and that under both price and volume uncertainty, ”the optimal strategy is a trade-offbetween early and late trades to balance the risk associated with both price and volume” [4].
The optimal placement problem is a much less studied algorithmic trading problem thanthe optimal execution one. This problem consists of determining how to split the orders intothe different levels of the limit order book at each period, to minimize the total expected cost.This problem is summarized in Guo et al. [5] as a problem where ”one needs to buy Norders before time
T > ” [5] and where N k,t is the number of orders at the k -th best bid( N ,t being the number of orders at the market price), and is solved in the case of a correlatedrandom walk model. We refer to Section 2.2. of Guo et al. [5] for the complete formulationof the optimal placement problem. The price impact is mainly studied in the case of the optimal execution problem. Gatheraland Schied [6] present an overview of the main price impact models. They distinguish threedistinct types of price impact: permanent, temporary, and transient.
Permanent and temporary price impact are usually studied together as two consequencesof the same cause. Whereas the permanent price impact affects the stock price and thereforeall subsequent orders, the temporary price impact only affects the price of the executed orderand does not influence the stock price. Both Almgren and Chriss [2] and Bertsimas and Lo[1] models present permanent and temporary price impact components.4atheral and Schied [6] recall in their paper the notion of ”price manipulation strategy” [6], being defined as an order execution strategy with strictly positive expected revenues.Then, they show on the one hand that an Almgren and Chriss [2] model which does notadmit price manipulation must have a linear permanent price impact, and on the otherhand that a Bertsimas and Lo [1] model with linear permanent price impact does not admitbounded price manipulation strategies.An estimation of the permanent and temporary price impact in the equity market is stud-ied in Almgren et al. [7], showing that while the linear permanent price impact hypothesiscannot be rejected on equity markets, the hypothesis of a square-root model for temporaryimpact is rejected, in favor of a power law with coefficient 0.6 [7].However, while many articles studying the permanent price impact do so under the hy-pothesis of a linear impact, some research articles question this linear hypothesis, statingthat permanent market impact can sometimes be nonlinear [8].Recent work pushes further the case of nonlinear permanent and temporary price impact,by considering a continuous-time price impact model close to the Almgren and Chriss [2]model but where the parameters of the price impact are stochastic [9]. They show thattheir stochastic optimal liquidation problem still admits optimal strategy approximations,depending on the stochastic behavior of the price impact parameters.
As explained in Gatheral and Schied [6], ”transience of price impact means that thisprice impact will decay over time” [6]. Transient price impact challenges classical models ofpermanent and temporary price impact, especially because permanent market impact mustbe linear to avoid dynamic arbitrage [10].Obizhaeva and Wang [11] propose a linear transient price impact model with exponen-tial decay, and additional research papers also deal with the study of linear transient priceimpact. However, other papers show the limits of the linear hypothesis in the transientprice impact model by studying the slow decay of impact in equity markets [12]. Recentwork presents a portfolio liquidation problem of 100 NASDAQ stocks under transient priceimpact [13].
The rise of Machine Learning over the last few years has shaken up many areas, includ-ing the field of algorithmic trading. Machine Learning, Deep Learning, and Reinforcement5earning can be used in algorithmic trading to develop intelligent algorithms, capable oflearning by themselves the evolution of a stock’s price or the best action to take for theexecution or placement of an order.In this section, we discuss the most recent advances in algorithmic trading through theuse of Machine Learning, assuming that the reader already has prior knowledge in this area.First, we present applications of Deep Learning in financial engineering, then the use ofReinforcement Learning agents for optimal order execution or placement, and finally, webriefly mention applications of Generative Adversarial Networks (GANs) for financial dataand time series generation.
Applications of Deep Learning in financial engineering are numerous. Deep Learning isgenerally used for estimating or predicting financial data, such as price trends for financialproducts. In this subsection, we are considering a group of neural networks that are Recur-rent Neural Networks (RNNs) and their applications, and the notion of transfer learning.
RNNs are a class of neural networks that allow previous outputs to be used as inputswhile having hidden states. RNNs are used for sequences of data, which can correspond, forexample, to a sequence of temporal or textual data. They aim to learn a sequential schemeof the data provided as the input of the network, the output of each cell depending on theoutput of the previous cells.In financial engineering, RNNs are commonly used for stock price prediction or assetpricing [14]. Other applications include predictions of cash flows or consumer default [15],but also more original applications as part of the analysis of alternative data, with, for ex-ample, the study of textual data or the study of the evolution of satellite images to acquireinformation on the health of a company.
Transfer learning focuses on storing the knowledge gained by solving a problem andapplying it to a different but related problem [16]. Transfer learning is generally studiedin the Deep Learning framework. It aims to train a neural network on a huge dataset tohave the neural network learning successfully the requested task, and then fine-tuning thetraining of this neural network on the few data of the new task we want our neural networkto perform, this new task having generally few training data to train a model on [17].6 very recent paper applied this concept to the transfer of systematic trading strategies[18]. The idea proposed in this paper is to build a neural network architecture – calledQuantNet – based on two layers specific to the market, and another layer which is market-agnostic between these two market-specific layers. Transfer learning is then carried out withthe market-agnostic layer. The authors of the paper claim an improvement of the sharperatio of 15% across 3103 assets in 58 equity markets across the world, in comparison withtrading strategies not based on transfer learning [18].
RL is one of the three kinds of Machine Learning (along with supervised and unsuper-vised learning) and consists of training an agent to take actions based on an observed stateand the rewards obtained by performing an action for a given state. By definition of RL, wecan see algorithmic trading as an RL problem where a trading agent aims to maximize itsprofit from buying or selling actions taken in a market.In this subsection, we first describe the challenges of using RL in algorithmic trading. Wethen discuss the framework of Multi-Agent Reinforcement Learning (MARL) where manytrading agents compete. Finally, we explain the importance of developing a realistic sim-ulation environment for the training of trading agents and the recent work done on this topic.
One of the first papers on RL for optimal execution was released in 2006 [19], showinga significant improvement over the methods used for optimal execution, with results ”basedon 1.5 years of millisecond time-scale limit order data from NASDAQ” [19].The state of an RL algorithm can have as many features as we want, however, we caneasily imagine that too many features would cause the curse of dimensionality issue andfeatures of such an RL algorithm for algorithmic trading should be chosen appropriately.Some common features used in an RL optimal execution problem are, among others, timeremaining, the number of shares remaining, spread, volume imbalance, and current price.The development of high-frequency trading has even made it necessary to develop RLalgorithms to act quickly and optimally on the market. In recent work, the most commonlyused RL algorithms for optimal execution are usually Q-Learning algorithms. Two paperspublished in 2018 use Q-Learning in the case of temporal-difference RL [20] and risk-sensitiveRL [21].Another paper released in 2018 presents the use of Double Deep Q-Learning for optimalexecution [22]. While Deep Q-Learning uses a neural network to approximate the Q-value7unction [23], Double Deep Q-Learning uses two neural networks to avoid overestimationthat can happen when we use only one neural network [24].
All previously mentioned papers are studying the optimal execution problem as a single-agent RL problem, i.e., only one agent is trained on the market data and there are no othercompeting agents. Such an approach is not representative of the reality of the high-frequencytrading market, where not only do millions of agents train on the market and compete, buteach agent is likely to adapt its strategy to the strategy of other agents. MARL is intendedto address this issue by having multiple agents at the same time – who may or may not train– to better capture the reality of financial markets.The use of MARL for market making has been addressed in a recent paper showing ”thereinforcement learning agent is able to learn about its competitor’s pricing policy” [25]. An-other recent paper discusses further the need for a MARL framework for the evaluation oftrading strategies [26].Especially, the latter reminds us that we can assess a trading strategy through two majormethods, which are Market Replay and Interactive Agent-Based Simulation (IABS) [26].Whereas in Market Replay, the simulation does not respond implementing the RL strategy,IABS aims to simulate responses of the market or of other agents, although an IABS simu-lation may remain not realistic with respect to real financial market conditions.
After having stated the need for a MARL simulation for the training of RL trading agents,one of the key issues is to build a simulation close to the reality of the high-frequency tradingmarket.The Agent-Based Interactive Discrete Event Simulation (ABIDES) environment [27] aimsto be such a realistic financial environment, by considering the ”market physics” of the realhigh-frequency trading world, including a nanosecond resolution, agent computation delaysor communication between agents through standardized message protocols [27]. ABIDESalso enables MARL between thousands of agents interacting through an exchange agent.A recent paper studies the realism of the ABIDES environment through the use of stylizedfacts on limit order books [28]. After a review of most of the stylized facts known for limitorder books, this paper shows that the two multi-agent simulations ran into ABIDES verifiesmost of the tested stylized facts. However, the paper acknowledges that further improvementis needed to have all the stylized facts verified.8 .3 Generative Adversarial Networks (GANs)
GANs have been introduced in the Goodfellow et al. paper [29]. The main idea of GANsis to train simultaneously two models, the first being called a generative model and whichneeds to reproduce the distribution of the data to generate, and the second being called adiscriminative model and which needs to test whether a sample comes from the training dataor the data created by the generative model.This training process is usually presented as a two-player minimax game of a valuefunction V ( G, D ) which is:min G max D V ( D, G ) = E x ∼ p data ( x ) [log D ( x )] + E z ∼ p z ( z ) [log 1 − D ( G ( z ))]A recent paper presents the use of GANs for the generation of financial time series, withan architecture called Quant GANs [30]. The key idea and innovation of this paper is theuse of Temporal Convolutional Networks (TCNs) for the generator and the discriminator,in order to ”capture long-range dependencies such as the presence of volatility clusters” [30].The authors of the paper have been able to generate successfully financial time series withsimilar behavior than S&P 500 stock prices.Ideally, financial data generated through GANs could be used by RL agents such as de-scribed in the previous section to improve the performance of the agents. Other applicationsof GANs for the generation of financial data are credit card fraud [31], credit scoring [32] ordeep hedging [33]. Optimal execution is probably the most known problem in algorithmic trading. In thispaper, we reminded the framework of the optimal execution problem in Bertsimas and Lo[1], and in Almgren and Chriss [2]. We also mentioned the problem of optimal placement anddiscussed the distinct types of price impact, which are permanent, temporary, and transientprice impact. We then described recent progress in algorithmic trading through the use ofMachine Learning. Whereas the use of Deep Learning for stock prediction has already beenwidely explored, there is room for improvement for the use of Reinforcement Learning foralgorithmic trading, and even more for the use of Generative Adversarial Networks.9 eferences [1] Dimitris Bertsimas and Andrew W Lo. Optimal control of execution costs.
Journal ofFinancial Markets , 1(1):1–50, 1998.[2] Robert Almgren and Neil Chriss. Optimal execution of portfolio transactions.
Journalof Risk , 3:5–40, 2001.[3] Damiano Brigo, Cl´ement Piat, et al. Static versus adapted optimal execution strategiesin two benchmark trading models.
World Scientific Book Chapters , pages 239–273,2018.[4] Julien Vaes and Raphael Hauser. Optimal execution strategy under price and volumeuncertainty. arXiv preprint arXiv:1810.11454 , 2018.[5] Xin Guo, Adrien De Larrard, and Zhao Ruan. Optimal placement in a limit order book:an analytical approach.
Mathematics and Financial Economics , 11(2):189–213, 2017.[6] Jim Gatheral and Alexander Schied. Dynamical models of market impact and algorithmsfor order execution.
HANDBOOK ON SYSTEMIC RISK, Jean-Pierre Fouque, JosephA. Langsam, eds , pages 579–599, 2013.[7] Robert Almgren, Chee Thum, Emmanuel Hauptmann, and Hong Li. Direct estimationof equity market impact.
Risk , 18(7):58–62, 2005.[8] Olivier Gu´eant. Permanent market impact can be nonlinear. arXiv preprintarXiv:1305.0413 , 2013.[9] Weston Barger and Matthew Lorig. Optimal liquidation under stochastic price impact.
International Journal of Theoretical and Applied Finance , 22(02):1850059, 2019.[10] Gur Huberman and Werner Stanzl. Price manipulation and quasi-arbitrage.
Economet-rica , 72(4):1247–1275, 2004.[11] Anna A Obizhaeva and Jiang Wang. Optimal trading strategy and supply/demanddynamics.
Journal of Financial Markets , 16(1):1–32, 2013.[12] Xavier Brokmann, Emmanuel Serie, Julien Kockelkoren, and J-P Bouchaud. Slow decayof impact in equity markets.
Market Microstructure and Liquidity , 1(02):1550007, 2015.[13] Ying Chen, Ulrich Horst, and Hoang Hai Tran. Portfolio liquidation under transientprice impact–theoretical solution and implementation with 100 nasdaq stocks.
Availableat SSRN 3504133 , 2019.[14] Luyang Chen, Markus Pelger, and Jason Zhu. Deep learning in asset pricing.
Availableat SSRN 3350138 , 2019. 1015] Stefania Albanesi and Domonkos F Vamossy. Predicting consumer default: A deeplearning approach. Technical report, National Bureau of Economic Research, 2019.[16] Fuzhen Zhuang, Zhiyuan Qi, Keyu Duan, Dongbo Xi, Yongchun Zhu, Hengshu Zhu,Hui Xiong, and Qing He. A comprehensive survey on transfer learning, 2019.[17] Chuanqi Tan, Fuchun Sun, Tao Kong, Wenchang Zhang, Chao Yang, and ChunfangLiu. A survey on deep transfer learning, 2018.[18] Adriano Koshiyama, Sebastian Flennerhag, Stefano B Blumberg, Nick Firoozye, andPhilip Treleaven. Quantnet: Transferring learning across systematic trading strategies. arXiv preprint arXiv:2004.03445 , 2020.[19] Yuriy Nevmyvaka, Yi Feng, and Michael Kearns. Reinforcement learning for optimizedtrade execution. In
Proceedings of the 23rd international conference on Machine learn-ing , pages 673–680, 2006.[20] Thomas Spooner, John Fearnley, Rahul Savani, and Andreas Koukorinis. Market mak-ing via reinforcement learning, 2018.[21] Svitlana Vyetrenko and Shaojie Xu. Risk-sensitive compact decision trees for au-tonomous execution in presence of simulated market response, 2019.[22] Brian Ning, Franco Ho Ting Ling, and Sebastian Jaimungal. Double deep q-learningfor optimal execution. arXiv preprint arXiv:1812.06600 , 2018.[23] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou,Daan Wierstra, and Martin Riedmiller. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 , 2013.[24] Hado van Hasselt, Arthur Guez, and David Silver. Deep reinforcement learning withdouble q-learning, 2015.[25] Sumitra Ganesh, Nelson Vadori, Mengda Xu, Hua Zheng, Prashant Reddy, and ManuelaVeloso. Reinforcement learning for market making in a multi-agent dealer market, 2019.[26] Tucker Hybinette Balch, Mahmoud Mahfouz, Joshua Lockhart, Maria Hybinette, andDavid Byrd. How to evaluate trading strategies: Single agent market replay or multipleagent interactive simulation? arXiv preprint arXiv:1906.12010 , 2019.[27] David Byrd, Maria Hybinette, and Tucker Hybinette Balch. Abides: Towards high-fidelity market simulation for ai research. arXiv preprint arXiv:1904.12066 , 2019.[28] Svitlana Vyetrenko, David Byrd, Nick Petosa, Mahmoud Mahfouz, Danial Dervovic,Manuela Veloso, and Tucker Hybinette Balch. Get real: Realism metrics for robustlimit order book market simulations, 2019.1129] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sher-jil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In
Advancesin neural information processing systems , pages 2672–2680, 2014.[30] Magnus Wiese, Robert Knobloch, Ralf Korn, and Peter Kretschmer. Quant gans: deepgeneration of financial time series.
Quantitative Finance , pages 1–22, 2020.[31] Dmitry Efimov, Di Xu, Luyang Kong, Alexey Nefedov, and Archana Anandakrishnan.Using generative adversarial networks to synthesize artificial financial datasets. arXivpreprint arXiv:2002.02271 , 2020.[32] Rogelio A Mancisidor, Michael Kampffmeyer, Kjersti Aas, and Robert Jenssen. Deepgenerative models for reject inference in credit scoring.
Knowledge-Based Systems , page105758, 2020.[33] Magnus Wiese, Lianjun Bai, Ben Wood, and Hans Buehler. Deep hedging: learning tosimulate equity option markets.