[PDF] FinRL: A Deep Reinforcement Learning Library for Automated Stock Trading in Quantitative Finance

Abstract

As deep reinforcement learning (DRL) has been recognized as an effective approach in quantitative finance, getting hands-on experiences is attractive to beginners. However, to train a practical DRL trading agent that decides where to trade, at what price, and what quantity involves error-prone and arduous development and debugging. In this paper, we introduce a DRL library FinRL that facilitates beginners to expose themselves to quantitative finance and to develop their own stock trading strategies. Along with easily-reproducible tutorials, FinRL library allows users to streamline their own developments and to compare with existing schemes easily. Within FinRL, virtual environments are configured with stock market datasets, trading agents are trained with neural networks, and extensive backtesting is analyzed via trading performance. Moreover, it incorporates important trading constraints such as transaction cost, market liquidity and the investor's degree of risk-aversion. FinRL is featured with completeness, hands-on tutorial and reproducibility that favors beginners: (i) at multiple levels of time granularity, FinRL simulates trading environments across various stock markets, including NASDAQ-100, DJIA, S&P 500, HSI, SSE 50, and CSI 300; (ii) organized in a layered architecture with modular structure, FinRL provides fine-tuned state-of-the-art DRL algorithms (DQN, DDPG, PPO, SAC, A2C, TD3, etc.), commonly-used reward functions and standard evaluation baselines to alleviate the debugging workloads and promote the reproducibility, and (iii) being highly extendable, FinRL reserves a complete set of user-import interfaces. Furthermore, we incorporated three application demonstrations, namely single stock trading, multiple stock trading, and portfolio allocation. The FinRL library will be available on Github at link this https URL.

Full PDF

aa r X i v : . [ q -f i n . T R ] N ov FinRL: A Deep Reinforcement Learning Library forAutomated Stock Trading in Quantitative Finance

Xiao-Yang Liu ∗ , Hongyang Yang , ∗ , Qian Chen , , Runjia Zhang ,Liuqing Yang , Bowen Xiao , Christina Dan Wang Electrical Engineering, Department of Statistics, Computer Science, Columbia University, AI4Finance LLC., USA, Ion Media Networks, USA, Department of Computing, Imperial College, New York University (Shanghai)Emails: {XL2427, HY2500, QC2231, LY2335}@columbia.edu,info@ai4ﬁnance.net, [email protected], [email protected]

Abstract

As deep reinforcement learning (DRL) has been recognized as an effective ap-proach in quantitative ﬁnance, getting hands-on experiences is attractive to begin-ners. However, to train a practical DRL trading agent that decides where to trade,at what price, and what quantity involves error-prone and arduous developmentand debugging. In this paper, we introduce a DRL library

FinRL that facilitatesbeginners to expose themselves to quantitative ﬁnance and to develop their ownstock trading strategies. Along with easily-reproducible tutorials, FinRL libraryallows users to streamline their own developments and to compare with existingschemes easily. Within FinRL, virtual environments are conﬁgured with stockmarket datasets, trading agents are trained with neural networks, and extensivebacktesting is analyzed via trading performance. Moreover, it incorporates impor-tant trading constraints such as transaction cost, market liquidity and the investor’sdegree of risk-aversion. FinRL is featured with completeness, hands-on tutorialand reproducibility that favors beginners: (i) at multiple levels of time granularity,FinRL simulates trading environments across various stock markets, includingNASDAQ-100, DJIA, S&P 500, HSI, SSE 50, and CSI 300; (ii) organized in alayered architecture with modular structure, FinRL provides ﬁne-tuned state-of-the-art DRL algorithms (DQN, DDPG, PPO, SAC, A2C, TD3, etc.), commonly-used reward functions and standard evaluation baselines to alleviate the debug-ging workloads and promote the reproducibility, and (iii) being highly extendable,FinRL reserves a complete set of user-import interfaces. Furthermore, we incor-porated three application demonstrations, namely single stock trading, multiplestock trading, and portfolio allocation. The FinRL library will be available onGithub at link https://github.com/AI4Finance-LLC/FinRL-Library.

Deep reinforcement learning (DRL), which balances exploration (of uncharted territory) and ex-ploitation (of current knowledge), has been recognized as an advantageous approach for automatedstock trading. DRL framework is powerful in solving dynamic decision making problems by learn-ing through interaction with an unknown environment, and thus providing two major advantages -portfolio scalability and market model independence [5]. In quantitative ﬁnance, stock trading isessentially making dynamic decisions, namely to decide where to trade, at what price, and what ∗ Equal contribution.Deep Reinforcement Learning Workshop, 34th Conference on Neural Information Processing Systems(NeurIPS 2020), Vancouver, Canada. uantity , over a highly stochastic and complex stock market. As a result, DRL provides useful toolk-its for stock trading [21, 44, 48, 45, 10, 8, 26]. Taking many complex ﬁnancial factors into account,DRL trading agents build a multi-factor model and provide algorithmic trading strategies, which aredifﬁcult for human traders [3, 47, 24, 22].Preceding DRL, conventional reinforcement learning (RL) [43] has been applied to complex ﬁnan-cial problems [31], including option pricing, portfolio optimization and risk management. Moodyand Saffell [36] utilized policy search and direct RL for stock trading. Deng et al. [12] showedthat applying deep neural networks proﬁts more. There are industry practitioners who have ex-plored trading strategies fueled by DRL, since deep neural networks are signiﬁcantly powerful atapproximating the expected return at a state with a certain action. With the development of morerobust models and strategies, general machine learning approaches and DRL methods in speciﬁcare becoming more reliable. For example, DRL has been implemented on sentimental analysis onportfolio allocation [27, 22] and liquidation strategy analysis [2], showing the potential of DRL onvarious ﬁnancial tasks.However, to implement a DRL or RL driven trading strategy is nowhere near as easy. The devel-opment and debugging processes are arduous and error-prone. Training environments, managingintermediate trading states, organizing training-related data and standardizing outputs for evaluationmetrics - these steps are standard in implementation yet time-consuming especially for beginners.Therefore, we come up with a beginner-friendly library with ﬁne-tuned standard DRL algorithms. Ithas been developed under three primary principles:•

Completeness . Our library shall cover components of the DRL framework completely,which is a fundamental requirement;•

Hands-on tutorials . We aim for a library that is friendly to beginners. Tutorials withdetailed walk-through will help users to explore the functionalities of our library;•

Reproducibility . Our library shall guarantee reproducibility to ensure the transparencyand also provide users with conﬁdence in what they have done.In this paper, we present a three-layered

FinRL library that streamlines the development stock tradingstrategies. FinRL provides common building blocks that allow strategy builders to conﬁgure stockmarket datasets as virtual environments, to train deep neural networks as trading agents, to analyzetrading performance via extensive backtesting, and to incorporate important market frictions. On thelowest level is environment, which simulates the ﬁnancial market environment using actual historicaldata from six major indices with various environment attributes such as closing price, shares, tradingvolume, technical indicators etc. In the middle is the agent layer that provides ﬁne-tuned standardDRL algorithms (DQN [29][34], DDPG [29], Adaptive DDPG [27], Multi-Agent DDPG [30], PPO[40], SAC [18], A2C [33] and TD3 [11], etc.), commonly used reward functions and standard eval-uation baselines to alleviate the debugging workloads and promote the reproducibility. The agentinteracts with the environment through properly deﬁned reward functions on the state space and ac-tion space. The top layer includes applications in automated stock trading, where we demonstratethree use cases, namely single stock trading, multiple stock trading and portfolio allocation.The contributions of this paper are summarized as follows:• FinRL is an open source library speciﬁcally designed and implemented for quantitativeﬁnance. Trading environments incorporating market frictions are used and provided.• Trading tasks accompanied by hands-on tutorials with built-in DRL agents are availablein a beginner-friendly and reproducible fashion using Jupyter notebook. Customization oftrading time steps is feasible.• FinRL has good scalability, with a broad range of ﬁne-tuned state-of-the-art DRL algo-rithms. Adjusting the implementations to the rapid changing stock market is well sup-ported.• Typical use cases are selected and used to establish a benchmark for the quantitative ﬁnancecommunity. Standard backtesting and evaluation metrics are also provided for easy andeffective performance evaluation. 2he remainder of this paper is organized as follows. Section 2 reviews related works. Section 3presents FinRL Library. Section 4 provides evaluation support for analyzing stock trading perfor-mance. We conclude our work in Section 5.

We review related works on relevant open source libraries and existing applications of DRL in ﬁ-nance.

Recent works can be categorized into three approaches: value based algorithm, policy based algo-rithm, and actor-critic based algorithm. FinRL has consolidated and elaborated upon those algo-rithms to build ﬁnancial DRL models. There are a number of machine learning libraries that sharesimilar features as our FinRL library.•

OpenAI Gym [4] is a popular open source library that provides a standardized set of taskenvironments. OpenAI Baselines [13] implements high quality deep reinforcement learn-ing algorithms using gym environments. Stable Baselines [19] is a fork of OpenAI Base-lines with code cleanup and user-friendly examples.•

Google Dopamine [7] is a research framework for fast prototyping of reinforcement learn-ing algorithms. It features plugability and reusability.•

RLlib [28] provides high scalability with reinforcement learning algorithms. It has modularframework and is very well maintained.•

Horizon [17] is a DL-focused framework dominated by PyTorch, whose main use case isto train RL models in the batch setting.

Recent works show that DRL has many applications in quantitative ﬁnance [14].Stock trading is usually considered as one of the most challenging applications due to its noisy andvolatile features. Many researchers have explored various approaches using DRL [38, 37, 10, 9, 48,16]. Volatility scaling can be incorporated with DRL to trade futures contracts [48]. By adding amarket volatility term to reward functions, we can scale up the trade shares with low volatility, andvice versa. News headline sentiments and knowledge graphs can also be combined with the timeseries stock data to train an optimal policy using DRL [37]. High frequency trading with DRL is alsoa hot topic [16]. Deep Hedging [5, 6] represents hedging strategies with neural networks learned bymodern DRL policy search. This application has shown two key advantages of the DRL approachin quantitative ﬁnance, which are scalability and model independent. It uses DRL to manage therisk of liquid derivatives, which indicates further extension of our library into other asset classes andtopics.

FinRL library consists of three layers: environments, agents and applications. We ﬁrst describe theoverall architecture, and then present each layer.

The architecture of the FinRL library is shown in Fig. 1, and its features are summarized as follows:•

Three-layer architecture : The three layers of FinRL library are stock market environment,DRL trading agent, and stock trading applications. The agent layer interacts with the envi-ronment layer in an exploration-exploitation manner, whether to repeat prior working-welldecisions or to make new actions hoping to get greater rewards. The lower layer providesAPIs for the upper layer, making the lower layer transparent to the upper layer.3igure 1: An overview of our FinRL library. It consists of three layers: application layer, DRL agentlayer, and the ﬁnance market environment layer.•

Modularity : Each layer includes several modules and each module deﬁnes a separate func-tion. One can select certain modules from any layer to implement his/her stock trading task.Furthermore, updating existing modules is possible.•

Simplicity, Applicability and Extendibility : Speciﬁcally designed for automated stocktrading, FinRL presents DRL algorithms as modules. In this way, FinRL is made accessi-ble yet not demanding. FinRL provides three trading tasks as use cases that can be easilyreproduced. Each layer includes reserved interfaces that allow users to develop new mod-ules.•

Better Market Environment Modeling : We build a trading simulator that replicates livestock market and provides backtesting support that incorporates important market frictionssuch as transaction cost, market liquidity and the investor’s degree of risk-aversion. All ofthose are crucial among key determinants of net returns.

Considering the stochastic and interactive nature of the automated stock trading tasks, a ﬁnancial taskis modeled as a Markov Decision Process (MDP) problem. The training process involves observingstock price change, taking an action and reward’s calculation to have the agent adjusting its strategyaccordingly. By interacting with the environment, the trading agent will derive a trading strategywith the maximized rewards as time proceeds.Our trading environments, based on OpenAI Gym framework, simulate live stock markets with realmarket data according to the principle of time-driven simulation [4]. FinRL library strives to providetrading environments constructed by six datasets across ﬁve stock exchanges.

We give deﬁnitions of the state space, action space and reward function.

State space S . The state space describes the observations that the agent receives from the envi-ronment. Just as a human trader needs to analyze various information before executing a trade, soour trading agent observes many different features to better learn in an interactive environment. Weprovide various features for users: 4 Balance b t ∈ R + : the amount of money left in the account at the current time step t .• Shares own h t ∈ Z n + : current shares for each stock, n represents the number of stocks.• Closing price p t ∈ R n + : one of the most commonly used feature.• Opening/high/low prices o t , h t , l t ∈ R n + : used to track stock price changes.• Trading volume v t ∈ R n + : total quantity of shares traded during a trading slot.• Technical indicators: Moving Average Convergence Divergence (MACD) M t ∈ R n andRelative Strength Index (RSI) R t ∈ R n + , etc.• Multiple-level of granularity: we allow data frequency of the above features to be daily,hourly or on a minute basis. Action space A . The action space describes the allowed actions that the agent interacts with theenvironment. Normally, a ∈ A includes three actions: a ∈ {− , , } , where − , , representselling, holding, and buying one stock. Also, an action can be carried upon multiple shares. We usean action space {− k, ..., − , , , ..., k } , where k denotes the number of shares. For example, "Buy10 shares of AAPL" or "Sell 10 shares of AAPL" are or − , respectively. Reward function r ( s, a, s ′ ) is the incentive mechanism for an agent to learn a better action. Thereare many forms of reward functions. We provide commonly used ones [14] as follows:• The change of the portfolio value when action a is taken at state s and arriving at new state s ′ [12, 44, 10, 37, 45], i.e., r ( s, a, s ′ ) = v ′ − v , where v ′ and v represent the portfoliovalues at state s ′ and s , respectively.• The portfolio log return when action a is taken at state s and arriving at new state s ′ [20],i.e., r ( s, a, s ′ ) = log( v ′ v ) .• The Sharpe ratio for periods t = { , ..., T } [23, 35], i.e., S T = mean ( R t ) std ( R t ) , where R t = v t − v t − .• FinRL also supports user deﬁned reward functions to include risk factor or transaction costterm such as in [12, 48, 5] The application of DRL in ﬁnance is different from that in other ﬁelds, such as playing chess and cardgames [42, 46]; the latter inherently have clearly deﬁned rules for environments. Various ﬁnancemarkets require different DRL algorithms to get the most appropriate automated trading agent. Real-izing that setting up training environment is a time-consuming and laborious work, FinRL providessix environments based on representative listings, including NASDAQ-100, DJIA, S&P 500, SSE50, CSI 300, and HSI, plus one user-deﬁned environment. With those efforts, this library frees usersfrom tedious and time-consuming data pre-processing workload.We are well aware that users may want to train trading agents on their own data sets. FinRL libraryprovides convenient support to user imported data to adjust the granularity of time steps. We specifythe format of the data for each of the use cases. Users only need to pre-process their data setsaccording to our data format instructions.

FinRL library includes ﬁne-tuned standard DRL algorithms, namely, DQN [29][34], DDPG [29],Multi-Agent DDPG [30], PPO [40], SAC [18], A2C [33] and TD3 [11]. We also allow users todesign their own DRL algorithms by adapting these DRL algorithms, e.g., Adaptive DDPG [27], oremploying ensemble methods [45]. The comparison of DRL algorithms is shown in Fig. 2The implementation of the DRL algorithms are based on OpenAI Baselines [13] and Stable Base-lines [19].

Standard metrics and baseline trading strategies are provided to support trading performance analy-sis. FinRL library follows a training-validation-testing ﬂow to design a trading strategy.5igure 2: Comparison of DRL algorithms.Figure 3: Data splitting.

FinRL provides ﬁve evaluation metrics to help users evaluate the stock trading performance directly,which are ﬁnal portfolio value, annualized return, annualized standard deviation, maximum draw-down ratio, and Sharpe ratio.

Baseline trading strategies should be well-chosen and follow industrial standards. The strategieswill be universal to measure, standard to compare with, and easy to implement. In FinRL library,traditional trading strategies serve as the baseline for comparing with DRL strategies. Investorsusually have two objectives for their decisions: the highest possible proﬁts and the lowest possiblerisks of uncertainty [41]. FinRL uses ﬁve conventional strategies, namely passive buy-and-holdtrading strategy [32], mean-variance strategy [1], and min-variance strategy [1], momentum tradingstrategy [15], and equal-weighted strategy to address these two mutually limiting objectives and theindustrial standards.

With our use cases as instances, the stock market data are divided into three phases in Fig. 3.Training dataset is the sample of data to ﬁt the DRL model. The model sees and learns from thetraining dataset. Validation dataset is used for parameter tuning and to avoid overﬁtting. Testing(trading) dataset is the sample of data to provide an unbiased evaluation of a ﬁne-tuned model.Rolling window is usually associated with the training-validation-testing ﬂow in stock trading be-cause investors and portfolio managers may need to rebalance the portfolio and retrain the modelperiodically. FinRL provides ﬂexible rolling window selection such as on a daily basis, monthly,quarterly, yearly or by user speciﬁed. 6igure 4: Performance of single stock trading using PPO in the FinRL library.Figure 5: Performance of multiple stock trading and portfolio allocation using the FinRL library.

In order to better simulate practical trading, we incorporate trading constraints, risk-aversion andautomated backtesting tools.

Automated Backtesting . Backtesting plays a key role in performance evaluation. Automated back-testing tool is preferable because it reduces the human error. In FinRL library, we use the Quantopianpyfolio package [39] to backtest our trading strategies. This package is easy to use and consists ofvarious individual plots that provide a comprehensive image of the performance of a trading strategy.

Incorporating Trading Constraints . Transaction costs incur when executing a trade. There aremany types of transaction costs, such as broker commissions and SEC fee. We allow users to treattransaction costs as a parameter in our environments:• Flat fee: a ﬁxed dollar amount per trade regardless of how many shares traded.• Per share percentage: a per share rate for every share traded, for example, 1/1000 or 2/1000are the most commonly used transaction cost rate for each trade.Moreover, we need to consider market liquidity for stock trading, such as bid-ask spread. Bid-askspread is the difference between the prices quoted for an immediate selling action and an immediatebuying action for stocks. In our environments, we can add the bid-ask spread as a parameter to thestock closing price to simulate real world trading experience.

Risk-aversion . Risk-aversion reﬂects whether an investor will choose to preserve the capital. It alsoinﬂuences one’s trading strategy when facing different market volatility level.To control the risk in a worst-case scenario, such as ﬁnancial crisis of 2007–2008, FinRL employsthe ﬁnancial turbulence index turbulence t that measures extreme asset price ﬂuctuation [25]: turbulence t = ( y t − µ ) Σ − ( y t − µ ) ′ ∈ R , (1)7 Table 1: Performance of single stock trading using PPO in the FinRL library. The Sharpe ratio of allthe ETFs and stocks outperform the market, namely the S&P 500 index.

Table 2: Performance of multiple stock trading and portfolio allocation over the DJIA constituentsstocks using the FinRL library. The Sharpe ratios of TD3 and DDPG excceed the DJIA index, andthe traditional min-variance portfolio allocation strategy.where y t ∈ R n denotes the stock returns for current period t, µ ∈ R n denotes the average of histori-cal returns, and Σ ∈ R n × n denotes the covariance of historical returns. It is used as a parameter thatcontrols buying or selling action, for example if the turbulence index reaches a pre-deﬁned threshold,the agent will halt buying action and starts selling the holding shares gradually. We demonstrate with three use cases: single stock trading [10, 8, 26, 48], multiple stock trading [44,45], and portfolio allocation [22, 27]. FinRL library provides practical and reproducible solutionsfor each use case, with online walk-through tutorial using Jupyter notebook (e.g., the conﬁgurationsof the running environment and commands). We select three use cases and reproduce the resultsusing FinRL to establish a benchmark for the quantitative ﬁnance community.Fig. 4 and Table 1 demonstrate the performance evaluation of single stock trading. We pick large-cap ETFs such as SPDR S&P 500 ETF Trust (SPY) and Invesco QQQ Trust Series 1 (QQQ), andstocks such as Google (GOOGL), Amazon (AMZN), Apple (AAPL), and Microsoft (MSFT). Weuse PPO algorithm in FinRL and train a trading agent. The maximum drawdown in Table 1 is largedue to Covid-19 market crash.Fig. 5 and Table 2 show the performance and multiple stock trading and portfolio allocation over theDow Jones 30 constitutes. We use DDPG and TD3 to trade multiple stocks, and allocate portfolio.

In this paper, we have presented FinRL library that is a DRL library designed speciﬁcally for auto-mated stock trading with an effort for educational and demonstrative purpose. FinRL is character-ized by its extendability, more-than-basic market environment and extensive performance evaluationtools also for quantitative investors and strategy builders. Customization is easily accessible on alllayers, from market simulator, trading agents’ learning algorithms up towards proﬁtable strategies.In a trading strategy design, FinRL follows a training-validation-testing ﬂow and provides automatedbacktesting as well as benchmark tests. As a walk-through tutorial in Jupyter notebook format, wedemonstrate easily reproducible proﬁtable strategies under different scenarios using FinRL: (i) sin-gle stock trading; (ii) multiple stock trading; (iii) incorporating the mechanism of stock informationpenetration. With FinRL Library, implementation of powerful DRL driven trading strategies is madean accessible, efﬁcient and delightful experience.8 eferences [1] Andrew Ang. Mean-variance investing.

Columbia Business School Research Paper No. 12/49. , August10, 2012.[2] Wenhang Bao and Xiao-Yang Liu. Multi-agent deep reinforcement learning for liquidation strategy anal-ysis.

ICML Workshop on Applications and Infrastructure for Multi-Agent Learning , 2019.[3] Stelios D Bekiros. Fuzzy adaptive decision-making for boundedly rational traders in speculative stockmarkets.

European Journal of Operational Research , 202(1):285–293, 2010.[4] Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Woj-ciech Zaremba. OpenAI Gym. arXiv preprint arXiv:1606.01540 , 2016.[5] Hans Buehler, Lukas Gonon, Josef Teichmann, Ben Wood, Baranidharan Mohan, and Jonathan Kochems.Deep hedging: Hedging derivatives under generic market frictions using reinforcement learning.

SwissFinance Institute Research Paper , 2019.[6] Jay Cao, J. Chen, John C. Hull, and Zissis Poulos. Deep hedging of derivatives using reinforcementlearning.

Risk Management & Analysis in Financial Institutions eJournal , 2019.[7] Pablo Samuel Castro, Subhodeep Moitra, Carles Gelada, Saurabh Kumar, and Marc G. Bellemare.Dopamine: A research framework for deep reinforcement learning. http://arxiv.org/abs/1812.06110 ,2018.[8] Lin Chen and Qiang Gao. Application of deep reinforcement learning on automated stock trading. In , pages29–33, 2019.[9] Marco Corazza and Francesco Bertoluzzo. Q-learning-based ﬁnancial trading systems with applications.

Econometric Modeling: International Financial Markets - Developed Markets eJournal , 2014.[10] Quang-Vinh Dang. Reinforcement learning in stock trading. In

ICCSAMA , 2019.[11] Stephen Dankwa and Wenfeng Zheng. Twin-delayed DDPG: A deep reinforcement learning techniqueto model a continuous movement of an intelligent robot agent.

Proceedings of the 3rd InternationalConference on Vision, Image and Signal Processing , 2019.[12] Yue Deng, F. Bao, Youyong Kong, Zhiquan Ren, and Q. Dai. Deep direct reinforcement learning forﬁnancial signal representation and trading.

IEEE Transactions on Neural Networks and Learning Systems ,28:653–664, 2017.[13] Prafulla Dhariwal, Christopher Hesse, Oleg Klimov, Alex Nichol, Matthias Plappert, Alec Rad-ford, John Schulman, Szymon Sidor, Yuhuai Wu, and Peter Zhokhov. Openai baselines. https://github.com/openai/baselines , 2017.[14] Thomas G. Fischer. Reinforcement learning in ﬁnancial markets - a survey. Fau discussion papers ineconomics, Friedrich-Alexander University Erlangen-Nuremberg, Institute for Economics, 2018.[15] Bryan Foltice and T. Langer. Proﬁtable momentum trading strategies for individual investors.

FinancialMarkets and Portfolio Management , 29:85–113, 2015.[16] Prakhar Ganesh and Puneet Rakheja. Deep reinforcement learning in high frequency trading.

ArXiv ,abs/1809.01506, 2018.[17] Jason Gauci, Edoardo Conti, Yitao Liang, Kittipat Virochsiri, Zhengxing Chen, Yuchen He, ZacharyKaden, Vivek Narayanan, and Xiaohui Ye. Horizon: Facebook’s open source applied reinforcementlearning platform. arXiv preprint arXiv:1811.00260 , 2018.[18] Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. Soft actor-critic: Off-policy maxi-mum entropy deep reinforcement learning with a stochastic actor.

International Conference on MachineLearning , 2018.[19] Ashley Hill, Antonin Rafﬁn, Maximilian Ernestus, Adam Gleave, Anssi Kanervisto, ReneTraore, Prafulla Dhariwal, Christopher Hesse, Oleg Klimov, Alex Nichol, Matthias Plap-pert, Alec Radford, John Schulman, Szymon Sidor, and Yuhuai Wu. Stable baselines. https://github.com/hill-a/stable-baselines , 2018.[20] Chien Yi Huang. Financial trading as a game: A deep reinforcement learning approach. arXiv preprintarXiv:1807.02787 , 2018.

21] John Hull et al.

Options, futures and other derivatives/John C. Hull.

Upper Saddle River, NJ: PrenticeHall, 2009.[22] Zhengyao Jiang, Dixing Xu, and J. Liang. A deep reinforcement learning framework for the ﬁnancialportfolio management problem.

ArXiv , abs/1706.10059, 2017.[23] Olivier Jin and Hamza El-Saawy. Portfolio management using reinforcement learning.

Stanford Univer-sity , 2016.[24] Youngmin Kim, Wonbin Ahn, Kyong Joo Oh, and David Enke. An intelligent hybrid trading system fordiscovering trading rules for the futures market using rough sets and genetic algorithms.

Applied SoftComputing , 55:127–140, 2017.[25] Mark Kritzman and Yuanzhen Li. Skulls, ﬁnancial turbulence, and risk management.

Financial AnalystsJournal , 66, 10 2010.[26] Jinke Li, Ruonan Rao, and Jun Shi. Learning to trade with deep actor critic methods. , 02:66–71, 2018.[27] Xinyi Li, Yinchuan Li, Yuancheng Zhan, and Xiao-Yang Liu. Optimistic bull or pessimistic bear: Adap-tive deep reinforcement learning for stock portfolio allocation.

ICML Workshop on Applications andInfrastructure for Multi-Agent Learning , 2019.[28] Eric Liang, Richard Liaw, Robert Nishihara, Philipp Moritz, Roy Fox, Ken Goldberg, Joseph E. Gonza-lez, Michael I. Jordan, and Ion Stoica. RLlib: Abstractions for distributed reinforcement learning. In

International Conference on Machine Learning (ICML) , 2018.[29] Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, DavidSilver, and Daan Wierstra. Continuous control with deep reinforcement learning.

ICLR , 2016.[30] Ryan Lowe, Yi I Wu, Aviv Tamar, Jean Harb, OpenAI Pieter Abbeel, and Igor Mordatch. Multi-agentactor-critic for mixed cooperative-competitive environments. In

Advances in Neural Information Process-ing Systems , pages 6379–6390, 2017.[31] David G Luenberger et al. Investment science.

OUP Catalogue , 1997.[32] B. G. Malkiel. Passive investment strategies and efﬁcient markets.

European Financial Management ,9:1–10, 2003.[33] Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley,David Silver, and Koray Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In

International Conference on Machine Learning , pages 1928–1937, 2016.[34] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare,Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. Human-level controlthrough deep reinforcement learning.

Nature , 518(7540):529–533, 2015.[35] J. Moody, L. Wu, Y. Liao, and M. Saffell. Performance functions and reinforcement learning for tradingsystems and portfolios.

Journal of Forecasting , 17:441–470, 1998.[36] John Moody and Matthew Saffell. Learning to trade via direct reinforcement.

IEEE Transactions onNeural Networks , 12(4):875–889, 2001.[37] Abhishek Nan, Anandh Perumal, and Osmar R Zaiane. Sentiment and knowledge based algorithmictrading with deep reinforcement learning.

ArXiv , abs/2001.09403, 2020.[38] PG Nechchi. Reinforcement learning for automated trading.

Mathematical EngineeringPolitecnico diMilano: Milano, Italy , 2016.[39] Quantopian. Pyfolio: A toolkit for portfolio and risk analytics in python. https://github.com/quantopian/pyfolio , 2019.[40] John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy opti-mization algorithms. arXiv preprint arXiv:1707.06347 , 2017.[41] William F Sharpe.

Portfolio theory and capital markets . McGraw-Hill College, 1970.[42] David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche,Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. Mastering the gameof go with deep neural networks and tree search.

Nature , 529(7587):484, 2016.

43] Richard S Sutton and Andrew G Barto.

Reinforcement learning: An introduction . MIT press, 2018.[44] Zhuoran Xiong, Xiao-Yang Liu, Shan Zhong, Hongyang Yang, and Anwar Walid. Practical deep rein-forcement learning approach for stock trading.

NeurIPS Workshop on Challenges and Opportunities forAI in Financial Services: the Impact of Fairness, Explainability, Accuracy, and Privacy , 2018.[45] Hongyang Yang, Xiao-Yang Liu, Shan Zhong, and Anwar Walid. Deep reinforcement learning for au-tomated stock trading: An ensemble strategy.

ACM International Conference on AI in Finance (ICAIF) ,2020.[46] Daochen Zha, Kwei-Herng Lai, Kaixiong Zhou, and X. X. Hu. Experience replay optimization. In

IJCAI ,2019.[47] Yong Zhang and Xingyu Yang. Online portfolio selection strategy based on combining experts’ advice.

Computational Economics , 50(1):141–159, 2017.[48] Zihao Zhang, Stefan Zohren, and Stephen Roberts. Deep reinforcement learning for trading.

The Journalof Financial Data Science , 2(2):25–40, 2020., 2(2):25–40, 2020.