Machine Learning Classification of Price Extrema Based on Market Microstructure and Price Action Features. A Case Study of S&P500 E-mini Futures
MMachine Learning Classification of Price Extrema Based onMarket Microstructure Features
A Case Study of S&P500 E-mini Futures
Artur Sokolovsky ∗ Luca Arnaboldi † Newcastle University, School of ComputingNewcastle Upon Tyne, UK
Abstract
The study introduces an automated trading system for S&P500 E-mini futures (ES)based on state-of-the-art machine learning. Concretely: we extract a set of scenarios fromthe tick market data to train the model and further use the predictions to model trading. Wedefine the scenarios from the local extrema of the price action. Price extrema is a commonlytraded pattern, however, to the best of our knowledge, there is no study presenting a pipelinefor automated classification and profitability evaluation. Our study is filling this gap bypresenting a broad evaluation of the approach showing the resulting average Sharpe ratio of6.32. However, we do not take into account order execution queues, which of course affectthe result in the live-trading setting. The obtained performance results give us confidencethat this approach is worthwhile.
As machine learning (ML) changes and takes over virtually every aspect of our lives, we are nowable to automate tasks that previously were only possible with human intervention. A field inwhich it has quickly gained in traction and popularity is finance [12]. This field, which is oftendominated by organisations with extreme expertise, knowledge and assets, is often consideredout of reach to individuals, due to the complex decision making and high risks. However, if onesets aside the financial risks, the people, emotions, and the many other aspects involved, the coreprocess of trading can be simplified to decision making under pre-defined rules and contexts,making it a perfect ML scenario.Most current day trading is done electronically, through various available applications. Mar-ket data is propagated by the trading exchanges and handled by specialised trading feeds tokeep track of trades, bids and asks by the participants of the exchange. Different exchangesprovide data in different formats following predetermined protocols and data structures. Finallythe dataset is relayed back to a trading algorithm or human to make trading decisions. Decisionsare then relayed back to the exchange, through a gateway, normally by means of a broker, whichinforms the exchange about the wish to buy or sell specific assets. This series of actions relies onthe understanding of a predetermined protocol which allows communication between the various ∗ Email: [email protected] , ORCID - 0000-0001-8080-1331 † Email: [email protected] , ORCID - 0000-0002-0808-2456 a r X i v : . [ q -f i n . T R ] S e p arties. Several software tools exist to ensure that almost all these steps are done for you, withthe decisions made being the single point that may be uniquely done by the individual. After amatch is made (either bid to ask or ask to bid) with another market participant, the match isconveyed back to the software platform and the transaction is completed. In this context, themain goal of ML is to automate the decision making in this pipeline.When constructing algorithmic trading software, or Automatic Trading Pipeline (ATP), eachof the components of the exchange protocol needs to be included. Speed is often a key factor inthese exchanges as a full round of the protocol may take as little as milliseconds. So to constructa robust ATP, time is an important factor. This extra layer adds further complexity to themachine learning problem. A diagram of what an ATP looks like in practice, is presented inFig 1. Feed HandlerExchange Electronic Trading PlatformMachine LearningHistoric Data Trading StrategyOrder entry Gateway ¢ market datapreprocess ASKBID order book decision livedata
Figure 1: Full overview of Automated Trading Platform componentsIn Fig 1 it can be observed that the main ML component is focused on the training of thedecision making and the strategy. This is by no means a straightforward feat as successful strate-gies are often jealously guarded secrets, as a consequence of potential financial profits. Severaldifferent components are required, not the least analysing the market to establish componentsof interest. Historical RAW market data contains unstructured information, allowing one toreconstruct all the trading activity, however that is usually not enough to establish persistentprice-action patterns due to market non-stationarity. This characterisation is a complex process,which requires guidance and domain understanding. While traditional approaches have focusedon trying to learn from the full market profile, over the whole year or potentially across dozens2f years, more recent work has proposed the usage of data manipulation to identify key eventsin the market [23], this advanced categorization can then become focus for the machine learninginput to improve performance.This methodology focuses on identifying states of a financial market, which can then be usedto identify points of drastic change in the correlation structure, whether positive or negative.Previous approaches have used these states to correlate them to world wide events and generalmarket values to categorize interesting scenarios [5], showing that using these techniques thetraining of the strategy can be greatly optimized. In this work we build above these foundationsproposing an approach for the extraction and classification of financial market patterns based onprice action and market microstructure. We then implement a custom APT in which we showthat our approach is capable to generate consistent profits with an average Sharpe ratio of 6.32for S&P500 E-mini futures as the asset of interest.The contributions of this paper are as follows: 1) a methodology to construct an automatedtrading platform using a state of the art machine learning techniques; 2) we present an auto-mated market profiling technique based on machine learning and 3) We propose and evaluate theperformance of a futures trading strategy based on market profiles - that is shown to performprofitably in an automated trading platform.The remainder of the paper is structured as follows: Section 2) will provide financial andmachine learning background needed to understand our approach; Section 3) provides descrip-tions of some related work; Section 4) provides descriptions of the data; Section 5) will presentour methodology for the automated market profiling approach and assessment methodology;Section 6) will detail our constructed APT results; in Section 7) a discussion is provided aboutimplications and potential limitations of the study; and Section 8) will conclude the work.
There are several different types of financial data, and each of these have a different role infinancial trading. They are widely classified into four categories: i.
Fundamental Data , this kindof data is formed by a set of documents, for example financial accounts, that a company has tosend to the organization that regulates its activities, this is most commonly accounting data ofthe business, ii.
Market Data , this constitutes all trading activities that take place, allowing youto reconstruct a trading book, iii.
Analytics , this is often derivative data acquired by analysingthe raw data to find patterns, and can take the form of fundamental or market analytics, andiv.
Alternate data , this is extra domain knowledge that might help with the understanding ofthe other data, such as world events, social media, twitter and any other external sources. Inthis work we analyse type ii data to construct profiles of the market, and therefore focus ourbackground on this datatype, a more comprehensive review of different data types is available inPrado & Lopez [8].
In order to prepare data for processing, the raw data is structured into predetermined formatsto make it easier for a machine learning algorithms to digest. There are several ways to groupdata, and various different features may be aggregated. The main idea is to identify a windowof interest based on some heuristic, and then aggregate the features of that window to get arepresentation, called
Bar . Bars may contain several features and it is up to the individual todecide what features to select, common features include:
Bar start time , Bar end time , Sumof Volume , Open Price , Close Price , Min and
Max (usually called High and Low) prices, andany other features that might help characterise the trading performed within this window. The3ecision of how to select this window may be a make or break for your algorithm, as it will meanyou either have good useful data or data not representative of the market. An example of thiswould be the choice of using time as a metric for the bar window, e.g. take two hours snapshots.However, given the fact there are active and non-active trading periods, you might find thatonly some bars are actually useful using this methodology. In practice, the widely consideredbest way to construct bars is based on the number of transactions that have taken place. Thisallows for the construction of informative bars which are independent of timings and get a goodsampling of the market, as it is done as a function of trading activity. There are of course manyother ways to select a bar [8], so it is up to the prospective user to select one that works forthem.
In mathematics an extremum is any point at which the value of a function is largest (a maximum)or smallest (a minimum). These can either be local or global extrema. A local extrema the valueis larger/lower at immediately adjacent points, while at an global extrema the value of thefunction is larger than its value at any other point in the interval of interest. If one wants tomaximise their profits theoretically, their intent would be to identify an extrema and trade atthat point of optimality i.e. the peak,. This is generally speaking a non trivial problem, aschanges in the market may be random. As most strategies will perform several trades localextrema are sought out instead.As far as the algorithms for a ATP are concerned, they will often perform with active trading,so finding a global extremum serves little purpose. Consequently, local extrema within a pre-selected window are instead chosen. Several complex algorithms exist for this with use casesin many fields such as biology [10]. However, the objective is actually quite simple: identifya sample for which neighbours on each side have a lower amplitude for maxima, and higheramplitude for minima. This approach is very straight forward and can be implemented with alinear search. In the case where there are flat peaks, which means several samples are of equalamplitude the middle sample is selected. Two further metrics of interest are, the prominenceand width of a peak. Prominence of a peak measures how much a peak stands out from thesurrounding baseline of the neear samples, and is defined as the vertical distance between thepeak and lowest point. Width of the peak is the distance between each of the lower bounds ofthe peak, signifying the peaks duration. In case of peak classification, these measures can aid amachine learning estimator to relate the obtained features with the discovered peaks, this avoidsattempts to directly relate properties of small peaks with large peaks and vice versa. These threemeasures allow for the classification of good points of trading as well as giving insight as to whatled to this classification with the prominence and width.
A market microstructure is the study of financial markets and how they operate. It’s featuresrepresent the way that the market operates, how decisions are made about trades, price discoveryprocess and many more [19]. The process of market microstructure analysis is the identificationof why and how the market prices will change, in order to trade profitably. These may include,1) the time between trades , as it is usually an indicator of trading intensity [3] 2) volatility ,which might represent evidence of good and bad trading scenarios, as high volatility may leadto unsuitable market state [15], 3) volume , which may directly correlate with trade duration,as it might represent informed trading rather than less high volume active trading [21], and 4) trade duration , high trading activity is related to greater price impact of trades and faster price4djustment to trade-related events, whilst slower trades may indicate informed single entities [11].Whist several other options are available they are often instrument related and require expertdomain knowledge. In general it is important to tailor and evaluate your features to cater thespecific scenario identified.One such important scenario to consider when catering to prices, is the aggressiveness ofbuyers and sellers. In an Order Book, a match implies a trade, which occurs whenever a bidmatches an ask and conversely, however the trade is only ever initiated by one party. In orderto dictate who is the aggressor is in this scenario, the tick rule is used [1]. The rule labels a buyinitiated trade as 1, and a sell-initiated trade as -1. The logic is the following an initial label l is assigned an arbitrary value of 1, if a trade occurs and the price change is positive, then l = 1if the price change is negative then l = 0 and if there is no price change l is inverted. This hasbeen shown to be able to identify the aggressor with high degree of accuracy. Machine learning is a field that has come to take over almost every aspect of our lives, frompersonal voice assistants to healthcare. So it comes at no surprise that it is also gaining popularityin the field of algorithmic trading. There is a wide range of techniques for machine learning,ranging from very simple such as regression, to techniques used for deep learning such as neuralnetworks. Consequently, its important to choose an algorithm which is most suited to the problemone wishes to tackle. In ATPs one of the possible roles of the machine learning is to identifysituations in which it is profitable to trade, depending on the strategy, for example if using aflat strategy the intent is to identify when the market is flat. In these circumstances, and due tothe potentially high financial implications of false negatives understanding the prediction is key.Understanding the prediction involves the process of being able to understand why the algorithmmade this decision. This is a non trivial issue and something very difficult to do for a wide rangeof techniques, neural networks being the prime example (although current advances are beingmade [29, 13] .Perhaps one of the simplest yet highly effective techniques is known as Support Vector Ma-chines (SVM). SVMs are used to identify the hyperplane that best separates a binary sampling.If we imagine a set of points mapped onto a 2d plane, the SVM will find the best line thatdivides the two different classifications in half. This technique can easily be expanded to work onhigher-dimensional data, and since it is so simple, it becomes intuitive to see the reason behinda classification. However, this technique whilst popular for usage in financial forecasting [28, 17],suffers from the drawback that it is very sensitive to parameter tuning, making it harder touse, and also does not work with categorical features directly, making it less suited to complexanalysis.Another popular approach are decision trees. The reason tree-based approaches are hugelypopular is because they are directly interpretable. To improve the efficacy of this techniqueseveral different trees are trained and used in unison to come up with the result. The mostpopular case of this is random forests. Random forests operate by constructing a multitude ofdecision trees at training time and outputting the classification of the individual trees. However,this suffers from the fact the the different trees are not weighted and contribute equally, whichmight lead to inaccurate results. One class of algorithms which has seen mass popularity bothfor its robustness, effectiveness and clarity is boosting algorithms. Boosters create “bins” ofclassifications that can be combined to reduce overfitting and improve the prediction. The datais split into N samples, either randomly or by some heuristic, and each tree is trained using oneof the samples. The results of each tree are then fed into the other trees to reduce overfittingand come to a combined result. Finally each tree features are internally evaluated leading to a5eakness measure which dictates the overall contribution to the result.The first usage of boosting using a notion of weakness was introduced by AdaBoost [14],this work presented the concept of combining the output of the boosters into a weighted sumthat represents the final output of the boosted classifier. This allows for adaptive analysis assubsequent weak learners are tweaked in favor of those instances misclassified by previous clas-sifiers. Following on from this technique two other techniques were introduced XGBoost [7] andLightGBM [18], both libraries gained a lot of traction in the machine learning community fortheir efficacy, and are widely used. In this category, the most recent algorithm is Catboost.Catboost [25] highly efficient and less prone to bias than its predecessors, it is quickly becomingone of the most used approaches, in part due to its high flexibility. Catboost was specificallyproposed to expand issues in the previous approaches which lead to target leakage, which some-times led to overfitting. This was achieved by using ordered boosting, a new technique allowingindependent training and avoiding leakage. Also allowing for better performance on categoricalfeatures.
Feature analysis is the evaluation of the input data to assess their effectiveness and contributionto the prediction. This may also take the form of creating new features using domain knowledgeto improve the data. The features in the data will directly influence the predictive models usedand the results that can be achieved. Intuitively the better the features that are prepared andchosen, the better the results that can be achieved. However, this may not always be the casefor every scenario, as it may lead to overfitting due to too large dimensionality of the data. Theprocess of evaluating and selecting the best features is referred to as feature engineering. Thefirst component of this process is feature importance evaluation. The simplest way to achievethis is by feature ranking [16], in essence a heuristic is chosen and each feature is assigned ascore based on this heuristic, ordering the features in descending order. This approach howevermay be problem specific and require previous domain knowledge. Another common approach isthe usage of correlations, to evaluate how the features relate to the output. The intent of thisapproach is to evaluate the dependency between the features and the result, which intuitivelymight result in a feature that contributes more to the output [4]. However these approachesevaluate the feature as a single component, in relation to the output, independent of the otherfeatures. Realistically one would want to understand their features as a whole and see how theycontribute to a prediction as a grouping.Beyond the initial understanding of the features it is important to get an understandingof the prediction. Compared to previously discussed approaches this starts from the resultof the model and goes back to the feature to see which ones contributed to the prediction.This has advantages over the pure feature analysis approaches as it can be applied to all thedifferent predictors individually and gives insights into the workings of the predictor. Recentadvances into this approach, namely SHAP (SHapleyAdditive exPlanations) [20], are able toprovide insight into a by prediction scoring of each feature. This innovative technique can allowthe step through assessment of features throughout the different predictions, providing guidedinsight which can also be averaged for an overall assessment. This is very useful for debuggingan algorithm, assessing the features and understanding the market classifications, making itparticularly effective for this case study. 6 .6 Trading strategy
In this context we only refer to common strategies for active trading. Active trading seeks togain profit by exploiting price variations, to beat the market over short holding periods. Perhapsthe most common type of strategy is trend based strategy. These strategies aim to identify shiftsin the market towards a raise or decrease in price and sell at the point where they are likely togain profit. The second very popular strategy is called flat strategy. Unlike trending marketsa flat market is a stable state in which the range for the broader market does not move eitherhigher or lower, but instead trades within the boundaries of recent highs and lows. This makes iteasier to understand changes of the market and make a profit with a known market range. Therole of the machine learning in both these strategies is to predict whether the market is enteringa state of flatness or trending respectively.
In order to test a trading strategy evaluation is performed to assess profitability. Whilst it ispossible to do so on real market data, it is generally more favourable to do so on historical datato get a risk free estimation of performance. The notion is that a strategy that would haveworked poorly in the past will probably work poorly in the future, and conversely. But as youcan see, a key part of backtesting is the risky assumption that past performance predicts futureperformance.Several approaches exist to perform backtesting and different things can be assessed. Beyondtesting of trading strategies backtesting can show how positions are opened and likelihoodsof certain scenarios taking place within a trading period. The more common technique is toimplement the backtesting within the trading platform, as this has the advantage that the samecode as live trading can be used. Almost all platforms allow for simulations on historical data,although it may differ in form from the raw data one may have used for training. For moreflexibility one can implement their own backtrader system in languages such as python or R.This specific approach enables for the same code pipeline that is training the classifier to alsotest the data, allowing for much smoother testing. Whilst this will ensure the same data that isused for training may be used for testing it may suffer from differences to the trading softwarethat might skew the results. Another limitation of this approach is that there is no connectionto the exchange or the broker, there will be limitations on how queues are implemented as wellas the simulating of latency which will be present during live trading. This means that theidentification of slippages, which is the difference between where the trade is activated by thealgorithm and the entry and exit for a trade contrasting to when it actually entered the marketinto the order gateway, which will differ and impact the order of trades.
In this section we break down related work in this area, including market characterization, priceextrema and optimal trading points, and automated trading systems. Each of the previous worksis compared to our approach.In their seminal work Munnix et al. [24] first proposed the characterization of market struc-tures based on correlation. Through this they were able to detect key states of market crisesfrom raw market data. This same technique also allowed the mapping of drastic changes in themarket, which corresponded to key points of interest for trading. By using k means clustering,the authors were able to predict whether the market was approaching a crisis, allowing them toreact accordingly and construct a resilient strategy. Whilst this approach was a seminal work in7he understanding of market dynamics, it was still based on statistical dependencies and correla-tions which are not quite as advanced as more modern machine learning approaches. Nonethelesstheir successful results initiated a lot more research in this area. Their way of analysing a marketas a series of states proved to be a winning strategy allowing for more focused decision makingand bettering the understanding of the market. Following on from this same strategy we seekto characterise the market as a series of peaks of interest to understand whether the marketstructure fits our desired trading window. This constitutes the initial stage of the ATP, or thepreprocessing, and with their approach several steps of manual intervention are still required.Historically there has been an intuition that the changes in market price are random. Bythis it is understood that whilst volatility is due to certain events, it is not possible to extractthem from raw data. However, volatility is still one of the core metrics for trading, despite thisassumed unpredictability. In an effort to statistically analyse price changes and break down keyevents in the market Caginalp & Caginalp [6], propose a method to find peaks in the volatility,representing price extrema. The price extrema represent the optimal point at which the priceis being traded before a large fluctuation. This strategy depends on the exploitation of a shiftaway from the optimal point to either sell high or buy low. The authors describe the supplyand demand of a single asset as a stochastic equation where the peak is found when maximumvariance is achieved. This approach is heavily market specific and requires some assumptionsto hold true in order for a convergence on a price maxima. However the implied relationship ofsupply and demand is something that will hold true for any exchange making it a great fit forvarious different instruments. In a much different context, Miller et al [22], analyse bitcoin datato find profitable trading bounds. Bitcoin, unlike more traditional exchanges, is decentralisedand traded 24h a day, making the data much more sparse and with less concentrated tradingperiods. This makes the trends much harder to analyse. Their approach manipulates the data insuch a way that it is smoothed, through the removal of splines, this seeks to manipulate curvesto make its points more closely related. By this technique they are able to remove outliers andfind clearer points of fluctuation as well as peaks. The authors then construct a bounded tradingstrategy which proves to perform quite well against unbounded strategies. Since bitcoin is moredecentralised and by the very nature of those investing in it, automated trading is much morecommon. This means that techniques to identify bounds and points of interest in the market arealso more favoured and widely used.An automated trading system, is a piece of code that autonomously trades in the market.The goal of such machine learning efforts is the identification of a market state in which a trade isprofitable, and to automatically perform the transaction at that stage. Such a system is normallyspecialised for a specific instrument, analysing unique patters to improve the characterisation.One such effort focusing on FX markets is, Dempster & Leemans [9]. In this work, a techniqueusing reinforcement learning is proposed, to learn market behaviours. Reinforcement learningis an area of machine learning concerned with how software agents ought to take actions inan environment in order to maximize the notion of cumulative reward. This is achieved byassigning higher rewards to positive actions and negative rewards to negative actions, leadingto an optimization towards actions that increment rewards, in financial markets this naturallycorresponds to profitable trades. Using this approach the authors are able to characterise whento trade, performed analysis of associated risks, and automatically made decisions based on thesefactors. In more recent work, Booth et al. [5], describe a model for seasonal stock trading usinga ensemble of random forests. This leveraged variability in seasonal data to predict the pricereturn based on these events taking place. Their random forest based approach reduces thedrawn-down change of peak to through events. Their approach is based on domain knowledgeof well known seasonality events, usual approaches following this technique find that whilst theevent is predictable, the volatility is not. So their characterisation allows to predict which events8ill lead to profits. The random trees are used to characterise features of interest in a window oftime, and multiple of these are aggregated to inform the decision process. These ensembles arethen weighted based on how effective they are and used to inform decision with higher weightshaving more input. Results across fifteen DRX assets show increases in profitability and inprediction precision.As can be seen statistical and machine learning techniques have been successfully appliedin a variety of scenarios proving effective as the basis of automatic trading and identificationof profitable events. Making the further investigation into more advanced machine learningtechniques a desirable and interesting area. Our work expands into these previous concepts toimprove performance and seek new ways to characterise market profiles.
In the study we use S&P500 E-mini futures contracts ESH2007, ESM2007, ES(H-Z)2017, ES(H-Z)2018 and ES(H-U)2019. Which correspond to ES futures contracts with expiration in March(H), June (M), September (U) and December (Z). Year 2007 data was used for preliminarytests and debugging. In our trading simulations we only consider the next expiring futurescontract with the conventionally accepted rollover dates - rolling to the next contract on thesecond Thursday of the expiration month. This decision ensures highest liquidity and, due tothe double-auction nature of the financial markets, minimum bid-ask spreads. Since ES is a verypopular trading instrument, its bid-ask spreads are usually 1 tick, which however does not holdduring extraordinary market events, scheduled news, session starts and ends. Spreads should betaken into account during backtesting, which is described more in detail in the following section.
The data can be obtained from CME Group DataMine, Top-of-book dataset, containing all theelectronic trades and their volumes as well as best bid&ask prices and sizes. Alternatively, onecan collect it live from a trading platform or purchase from a third-party data feed. The datacan be available in at least two forms: market records and ticks (or lower resolution data). Ofcourse, ticks and lower resolution data can be obtained by aggregating market records.
In our study we demonstrate that out of the shelf machine learning methods can be successfullyapplied to financial markets analysis and trading in particular. Taking into account the non-stationary nature of the financial markets, we are achieving this goal by considering only a subsetof the time series. We propose a price-action-based way of defining the subsets of interest andperform their classification. Concretely: we identify local price extrema and predict whetherthe price will reverse (or ’rebound’) or continue its movement (also called ’crossing’). For thedemonstration purposes, we set up a simplistic trading strategy, where we are trading a pricereversal after a discovered local extremum is reached as shown in Fig. 4. The section is organizedin the following way:1. data pre-processing is described;2. detecting the price extrema;3. obtaining a set of features for each detected extremum;9. classification of the extrema based on the obtained features. This step involves featureselection, model parameter optimization, training, testing and analyzing the model;5. setting up a simple trading strategy based on the obtained labels for test contract andperforming the backtesting.
In the current study we use tick data with bid and ask traded volumes which indicate aggressivesellers and buyers, respectively. Additionally, as complementary data, we use best bid-ask orderbook (OB) records (also called L1) to get basic OB per-tick statistics.We consider a tick to incorporate all the market events between the two price changes by aminimum price step. For the considered financial instrument it is $0.25. When designing theper-tick features we pursue two goals: i) consider different aspects of the market, and ii) limitthe number of features to avoid overfitting. The following features are collected on a per-tickbasis: • Volume (bid and ask); • Number of trades (bid and ask); • Number of Order Book (OB) changes (bid and ask); • Maximum OB size (bid and ask); • Minimum OB size (bid and ask); • Largest trade size (bid and ask); • Time stamp when the tick started; • Time stamp when the tick ended.From these we later construct the price-level features.
As we are aiming to trade price reversals from extrema, they should be detected first. Whendetecting the extrema we use a sliding window approach on ticks with a window size of 500. Weintroduce limits to the peak widths - from 100 to 400. This serves three purposes: i) ensuresthat we do not consider high-frequency trading (HFT) scenarios which requires more modellingassumptions and different backtesting engine; ii) allows to stay in intraday trading time framesand have a large enough number of trades for analysis; iii) makes the price level feature valuescomparable across all the entries.
To perform the extrema classification, we obtain two types of features: i) designed from theprice level ticks (called price level (PL) features), and ii) obtained from the ticks right before theextremum is approached (called market shift (MS) features) as we illustrate in Fig. 2. We thinkit is essential to perform the two-step collection since the PL features contain properties of theextremum, and the MS features allow to spot any market changes happened between the timewhen the extremum was formed and the time we are trading it.10 rossingPrice LevelPL Features MS Features
Figure 2: The figure illustrates what data we use to design the two types of features: price level(PL) and market shift (MS) features.Considering different extrema widths, is varying dimensionality of the data does not allow touse it directly for classification - most of the algorithms take fixed-dimensional vectors as input.We ensure the fixed dimensionality of a classifier input by aggregating per-tick features by price.We perform the aggregation for the price range of 10 ticks below (or above in case of a minimum)the extremum. This price range is flexible - 10 ticks are often not available within ticks associatedwith the price level (red dashed rectangle in Fig. 2) in this case we fill the empty price featureswith zeros. We assume that the further the price from the extremum the less information relevantfor the classification it contains. Considering the intraday volatility of ES, we expect that theinformation beyond 10 ticks from the extremum is unlikely to improve the predictions. If oneconsiders larger time frames (peak widths), this number might need increasing.PL features are obtained from per-tick features by grouping by price with sum , max or count statistics. For instance: if one is considering volumes, it is reasonable to sum all the aggressivebuyers and sellers before comparing them. Of course, one can also compute mean or consider max and min volumes per tick. If following this line of reasoning, the feature space can beincreased to very large dimensions. We empirically choose a feature space described in Tab. 1.Here we shrink the feature space in order to make the feature selection step computationallyfeasible. As a result, the feature space is quite shallow, however sufficient to demonstrate thebenefits based on the presented results.To track the market changes, for the MS feature set we use 237 and 21 ticks and comparestatistics obtained from these two periods. Non-round numbers help avoiding interference withthe majority of manual market participants who use round numbers [8].11able 1: Feature space used in the study. The features are obtained in two steps - after the pricelevel is formed, and right before it is approached. When discussed, features are referred by thecodes in the square brackets at the end of descriptions. Equation DescriptionPrice level (PL) features (cid:80) p = | PL − t | t ( V b + V a ) Bid and ask volumes summed across all the ticks at price P [PL0] (cid:80) p = | PL − t | t V b Bid volumes summed across all the ticks at price P [PL1] (cid:80) p = | PL − t | t V a Ask volumes summed across all the ticks at price P [PL2] (cid:80) p = | PL − t | t T b Number of bid trades summed across all the ticks at price P [PL3] (cid:80) p = | PL − t | t T a Number of ask trades summed across all the ticks at price P [PL4] (cid:80) p = | PL − t | t M ( O ) b Sum of maximum bid order book quotes across all the ticksat price P [PL5] (cid:80) p = | PL − t | t M ( O ) a Sum of maximum ask OB quotes across the ticks at price P [PL6] (cid:80) p = | PL − t | t
1, Number of ticks at price P [PL7] (cid:80) p = | PL − t | t V b (cid:80) p = | PL − t | t V a PL1 divided by PL2, at price P [PL8] (cid:80) p = | PL − t | t T b (cid:80) p = | PL − t | t T a Feature PL3 divided by feature PL4, at price P [PL9] (cid:80) p = | PL − t | t M ( O ) b (cid:80) p = | PL − t | t M ( O ) a Feature PL5 divided by feature PL6, at price P [PL10] (cid:80) p = | PL − t | t ( V b + V a ) (cid:80) p = | PL − t | t Total volume at price | P L − t | divided by the number of ticks [PL11] (cid:80) t =0 (cid:80) p = | PL − t | t V a Total Ask Volume [PL12] (cid:80) t =0 (cid:80) p = | PL − t | t V b Total Bid Volume [PL13] (cid:80) t =0 (cid:80) p = | PL − t | t T a Total Ask Trades [PL14] (cid:80) t =0 (cid:80) p = | PL − t | t T b Total Bid Trades [PL15] (cid:80) t =0 (cid:80) p = | PL − t | t ( V a + V b ) Total Volume [PL16]- Peak extremum - minimum or maximum [PL17]- Peak width in ticks described in the Background section [PL18]- Peak prominence - described in the Background section [PL19]- Peak width height - described in the Background section [PL20]Market shift (MS) features (cid:80) w =237 t,b ( V b ) (cid:80) w =237 t,a ( V a ) Fraction of bid over ask volume for last 237 ticks [MS0] (cid:80) w =237 t,b ( T b ) (cid:80) w =237 t,a ( T a ) Fraction of bid over ask trades for last 237 ticks [MS1] (cid:80) w =237 t V b (cid:80) w =237 t V a − (cid:80) w =21 t V b (cid:80) w =21 t V a Fraction of bid/ask volumes for long minus short periods [MS2] (cid:80) w =237 t T b (cid:80) w =237 t T a − (cid:80) w =21 t T b (cid:80) w =21 t T a Fraction of bid/ask trades for long minus short periods [MS3] (cid:80) w =237 t M ( O ) b (cid:80) w =237 t M ( O ) a − (cid:80) w =21 t M ( O ) b (cid:80) w =21 t M ( O ) a Fraction of sums of max OB bid/ask quotesfor long periods minus short periods [MS4]KeyOB - order book T - tradest - ticks N - total ticksp - price w - tick windowPL - price level price V - volumeb - bid P N - price level neighbours until distance Na - ask M(O) - Maximum value in order book
12e also choose the values to be comparable to our expected trading time frames. No opti-mization was made on them. We obtain the MS features being 2 ticks away from the price levelto ensure that our modelling does not lead to any time-related bias where one cannot physicallysend the order fast enough. When labeling the extrema as crossed or rebounded, we assumethat the level is crossed in case the price makes at least 3 ticks above (for a maximum, and belowfor a minimum) the level and rebounded in case there is a reversal and movement in the oppositedirection for 5-15 ticks. We report a range of experiments for different rebounds.
In order to trade at the price level a price reversal (or rebound) must have been predicted at it.After the input features are designed and collected, a machine learning model is used to performthe classification. For that purpose we choose CatBoost classifier. We feel that CatBoost isa good fit for the task since it is resistant to overfitting, stable in terms of parameter tuning,efficient and one of the best-performing boosting algorithms. Finally, being based on decisiontrees, it is capable of correctly processing zero-padded feature values when no data at price isavailable. Other types of estimators might be comparable in one of the aspects and require muchmore focus in the other ones. For instance, neural networks might offer a better performance,but are very demanding in terms of architecture and parameter optimization.We perform a commonly accepted in ML community set of steps of model classifier optimiza-tion. It involves feature selection and model parameter tuning. In this study we use precision asa scoring function ( S ): S = T PT P + F P , (1)where TP is the number of true positives and FP the number of false positives. This was chosenover other metrics since in trading every FP might lead to losses, while false negative ( FN )means only a lost opportunity, but does not lead to any financial loss. In order to avoid largebias in the base classifier probability, at all the stages we balance the classes introducing classweights into the model.Firstly, we perform the feature selection step using a Recursive Feature Elimination withcross-validation (RFECV) method. The method is based on removing features from the modelone-by-one starting from the least important ones based on the model’s feature importance, andmeasuring the performance on a cross-validation dataset. This way we ensure an optimal subsetof features for the trading challenge where the objective is to maximize the fraction of profitabletrades. Cross-validation allows to avoid overfitting by checking how the model performancegeneralizes into unseen data. Secondly, we optimize the parameters of the model in a grid-search fashion. For the parameter optimization we use a cross-validation dataset as well. Weperform training and cross-validation within a single contract and the backtesting of the strategyon the subsequent one to ensure relevance of the optimized model. For the test data we reportprecision (1) as the metric of interest for optimization. Considering take-profit and stop-losssizes of the trading strategy, based on precision one can already see whether the approach ispotentially profitable. We provide the line of reasoning of the model analysis on the example of the model trained onESM2019 and tested on ESU2019 contract for the best-performing rebound definition - it can be13pplied to the rest of the models in the same fashion using the provided code base and data The analysis is done on a test dataset using SHAP approach, where for each entry we obtainthe contributions of the feature values to the final output. We use this approach to investigatehow the classification decisions were made, spot interesting cases and see whether the decisionsagree with our expectations.
The trading strategy is defined based on our definition of the crossed and rebounded price levels,and schematically illustrated in Fig. 3. It is a flat market strategy, where we expect a pricereversal from the price level. Backtrader Python package is used for backtesting the strategy.Backtrader does not allow taking bid-ask spreads into account, that is why we are minimizing itsaffects by excluding HFT trading opportunities (by limiting peak widths) and limiting ourselvesto the actively traded contracts only. endstop loss: 3 tickstake profit: 15 ticksget input featuresclassifycrossing|reboundsubmit limit order at price levelprice level min|maxstart Tickshortlimitorder longlimitordercrossingrebound minmax 2 ticks from PLyes|noyes no Figure 3: The block diagram illustrates steps of the trading strategy. Anonymous. (2020, September 18). Machine Learning Classification of Price Extrema Based onMarket Microstructure Features. A Case Study of S&P500 E-mini Futures. (Version 1.0.0). Zenodo.http://doi.org/10.5281/zenodo.4036851. Results
The results of each component of the methodology are presented separately in different subsec-tions. An evaluation of these results is presented in Sec. 7.
The considered sample of the data is described in Tab. 2. We provide the numbers of ticks percontract without any filtering. For the classification we limit ourselves to the actively-tradedtimes. Additionally, we illustrate the numbers of entries in the classification tasks in Fig. 5.Table 2: Numbers of reconstructed ticks per contract used in the study.
Contract Number of ticks
ESH2017 1271810ESM2017 1407792ESU2017 1243120ESZ2017 1137427ESH2018 2946336ESM2018 2919757ESU2018 1825417ESZ2018 3633969ESH2019 3066530ESM2019 2591000ESU2019 2537197
As an example of the peak detection we are showing a sample of data from March 14, 2007,ESH2007 in Fig. 4. We provide some basic relationships between the numbers of ticks, totalnumbers of price levels and numbers of rebounded price levels per contract in Fig. 5 to give someidea of the class imbalances and density of peaks per tick.Feature selection and model optimization are usually done sequentially and sometimes anumber of iterations can be made. In our preliminary tests on year 2007 data we observed thatshrinking the feature space often improves the model performance (not reported).Since CatBoost is quite stable with respect to the parameter tuning, we choose to performthe feature selection as the first step and the model optimization as the second. For the featureselection step only class weights are set to ‘balanced’s in order to avoid large bias in the baseprobability, the rest of the parameters of the model are left default. Even though CatBoost hasa very wide range of parameters which can be optimized, we choose the most commonly tweakedparameters for the sake of feasibility of the optimization. The following parameters are optimized:1)
Number of iterations , 2)
Maximum depth of trees , 3) has time parameter set to True of False ,anmd 4) l2 regularisation . A full description of the parameter tuning is available in Tab. 3. Sincewe balance the class weights, when labeling the entries, we use a default confidence (or classprobability) of 0.5 for the output. Additionally, for the potential future model improvements weillustrate the impact of the confidence threshold in Fig. 6. The results obtained per contract areprovided in Fig. 7. 15igure 4: A sample of ESH2007 contract with peaks and peak widths annotated.Table 3:
Experiment Configuration A.
Parameter optimization for the fifteen tick rebounds.
Contract - represents the training data used,
Depth - maximum depth of the tree,
Has time -indicates whether temporal scale is used for training (always optimized to True - not presentedin the table), Iterations - number of training iterations, l2 leaf reg - L2 regularization factor ofthe cost function,
Learning rate - learning rate of the estimator.
Contract Depth Iterations l2 leaf reg Learning rate
ESH2007 6 100 4 0.30ESM2007 10 1000 7 0.03ESH2017 10 1000 1 0.03ESM2017 6 100 1 0.03ESU2017 5 1000 1 0.30ESZ2017 6 1000 1 0.03ESH2018 10 500 7 0.03ESM2018 5 1000 1 0.30ESU2018 10 500 1 0.03ESZ2018 10 500 1 0.03ESH2019 6 1000 1 0.03ESM2019 5 1000 1 0.3016
SH2017 ESU2017 ESH2018 ESU2018 ESH2019 ESU2019Contract025005000750010000125001500017500 N u m b e r o f r e b o un d s Price Levels TotalPrice Level Rebound 5Price Level Rebound 7Price Level Rebound 9Price Level Rebound 11Price Level Rebound 13Price Level Rebound 15 15000002000000250000030000003500000 N u m b e r o f t i c k s Ticks
Figure 5: The plot illustrates a basic relation between the numbers of ticks and numbers ofrebounds per contract as well as the overall number of price levels used in the study. P r e c i s i o n N u m b e r o f e n t r i e s Figure 6: Test precision of the model trained on ESM2019, 15 ticks rebound with a varyingconfidence threshold.
In Tab. 1 we explain all the features chosen for the model during the RFECV feature selectionstep. There are two points at which the features are collected - right after the price level is17 S H E S H E S U E S H E S U E S H Training Data Contract0.1750.2000.2250.2500.2750.3000.3250.3500.375 P r e c i s i o n Rebound 5Rebound 7Rebound 9Rebound 11Rebound 13Rebound 15
Figure 7: Per contract classification precision measured on the test data (the following contractafter training). One can see that the rebound size does not change the performance much. Atthe same time there is a trend towards performance decrease with time, which is expected.formed, and right before it is approached for crossing or rebounding. Firstly, we have obtainedSHAP values for the model and plotted it in Fig. 8 to compare feature contributions. One cansee that all 7 features have comparable contributions. The closest to linear relation betweenthe feature values and the contribution is observed for MS1, where large feature values oftencontribute to a positive label output and vice versa.We can observe there is no single feature having a large contribution towards a positive labelmeaning that either detecting rebounds is a more complex task requiring smaller impacts frommultiple features or it is a result of penalization only false positives in the feature selection andparameter optimization phases. The positive contribution towards the misclassified positivelylabeled entry is an outlier point on the right in Fig. 8, feature PL8.We also wanted to investigate prediction paths with the largest feature contributions. Forthat we have taken the top 25 entries with the largest impact among all the features - the resultare provided in Fig. 9 (a). Most of the paths end up at a confident negative output (correspondsto ’crossing’). In most of the cases the largest contribution comes from PL8 and PL11. Inthe cases where the model is uncertain i.e. outliers crossing the base probability line (grey)after PL11, one can see that the MS features contribute significantly to the output. This is anindication of the changed market situation, where the model comes up with a positive label basedon the price-level-formation features and then changes the prediction right before the price levelis approached. It supports our hypothesis that MS features indicate the most recent changes ofthe market, which can significantly impact the output of the model. Surprisingly, there is onlyone positively labeled entry and it is misclassified. It can be concluded that the model is moreoften confident about the negative labels rather than the positive.To further gain understanding of the prediction paths, we have taken the same top 25 entries18
SHAP value (impact on model output)MS3MS2MS1PL1PL11, t=2PL11, t=0PL8, t=1
LowHigh F e a t u r e v a l u e Figure 8: The figure illustrates the feature contributions to the output on a per-entry basis. Xaxis shows the strength of the contribution either towards a positive label (when >
0) or towardsthe negative one. Colors indicate the feature value - blue corresponds to small values and red -to large values.from a random sample of 200 entries - the result is shown in Fig. 9 (b). Here we saw more entrieswhere the model was not confident. Also, there are more positively labeled misclassified entries.One can see a common prediction path for feature PL8, where it pushes the probability above0.5 but then there is often a huge impact towards the opposite direction from PL11. Also, onecan see that different offsets ( t ’s) of PL11 often contribute in the same direction. There is acouple of misclassified entries, where these two features contributed differently - as the futurework, it might be worth investigating detection of potentially alarming feature behaviour. When backtesting the strategy we enter the market at the price level with a limit order with a 3ticks stop-loss and 5-15 ticks take-profit depending on the experiment configuration (Fig. 3. Theresulting profit curves and Sharpe ratios are shown in Fig. 10.When computing the net outcomes of the trades, we add $4.2 per-contract trading costs, basedon our assessment of current lowest broker fees. We do not set any slippage in the backtestingengine, since ES liquidity is large. However, we execute the stop-losses and take-profits on thetick following the close-position signal to account for order execution delays and slippages at thesame time. This allows to take into account large volatilities and gaps happening on marketevents, which might work in both directions, if the position is closed with a market order. Thebacktesting done on tick data therefore there are no bar-backtesting assumptions in the result.We provide the trades statistics for the model trained on ESM2019 and tested on ESU2019, 15ticks rebound in Tab. 4. 19 .00.0 0.20.2 0.40.4 0.60.6 0.80.8 1.01.0
Model output valuePL8, t=1PL11, t=0PL11, t=2PL1MS1MS2MS3 (a) Full Dataset
Model output valuePL8, t=1PL11, t=0PL11, t=2PL1MS1MS2MS3 (b) 200 Random Samples
Figure 9: Top 25 entries by feature contribution strength from the full dataset (a) and 200 entriesrandom sample of the test dataset (b). The figure should be viewed from the bottom to the top,where each dashed horizontal line accounts for the feature on the left and curved lines of of colorsbetween blue and red correspond to classification cases. They root from the base probability atthe bottom and approach the output probability at the top. Misclassified entries are drawn withdashed lines. 20 an Jan Apr Jul Oct Apr Jul2018 2019 (a) Sharpe ratios
Jan Jan Jan May Sep May Sep May Sep2017 2018 2019 (b) Profit curves
Figure 10: Profit curves for all the rebound configurations for years 2017-2019 with correspondingannual rolling Sharpe Values (computed for zero risk-free income).Table 4:
Trades Statistics . Results for Experiment Configuration A (Tab. 3). The results areprovided for the maximum position size of 1 contract.
Winners Losers All
Total Trades (Rounds) 1570 4808 6378Total Commission [$] 6594 20193.6 26787.6Max NET Profit (Loss) [$] 683.3 -4.2 683.3Min NET Profit (Loss) [$] 8.3 -741.7 -741.7Total NET Profit (Loss) [$] 256568 -164569 91999Avg. NET profit per trade 163.419 -34.2281 14.4246Longest Trade 3 days 01:00:01 2 days 01:28:45 3 days 01:00:01Total Time in the market 51 days 19:36:38 16 days 23:59:30 68 days 19:36:08Avg. Time in a trade 0 days 00:47:31 0 days 00:05:05 0 days 00:15:32Trades longer 1h 233 26 259% of Time in the market 0.068% of Profitable Trades 0.25STD of daily return [$] 320Avg. daily return [$] 153Max Drawdown [$] 2209Avg. Conseq. NO. of Winners 1.41Max Conseq. NO. of Winners 5Avg. Conseq. NO. of Losers 4.31Max Conseq. NO. of Losers 25
This section breaks down and analyses the results presented in Sec. 6. Firstly results are discussedin relation to performance, profitability as well as the effectiveness of the machine learningapproach. Secondly. we discuss several limitations in regards to our results and how they can beaddressed. Thirdly, the trading strategy is discussed, although it was a rather simple strategyto showcase the approach, we discuss profitability and potential extensions. Finally we present21ur intuition of potential advancements and future work for this area.
One can see that the number of price levels closely follows the numbers of ticks per contract(Fig. 5). Since the number of peaks is proportionally related to the number of ticks, the densityof the peaks considered in the study does not change with time. Consequently, the peak patternwe are interested in can be considered relatively stationary, not disappearing in various marketconditions, but rather adjusting its time horizon to the market.Interestingly, we observed that the rebound size does not change the performance of themodel significantly (Fig. 7). After the feature selection and parameter tuning steps, the com-pared models have different parameters and operate on different feature subsets, consequently,performance differences withing a single contract might have different sources. At the same time,the relative performance across contracts clearly follows the same pattern for at least 5 contracts- from ESZ2017 to ESZ2018. It might be the case that the used features spot tend to workbetter for a particular market and the prevalence of this market state defines the success of theclassification. Moreover, larger rebounds do not necessarily lead to lower precision.When studying the model confidence vs precision (Fig. 6) we find that the number of entriesdecreases rapidly, and a confidence threshold above 0.55 becomes impractical to use due to a lownumber of entries classified as rebounds. At the same time, based on this plot we can concludethat optimization of the threshold does not lead to significant performance change. Taking intoaccount financial markets non-stationarity this observation is expected. Our attempts to get arelatively stationary pattern in the data worked to some extent but, of course, do not make thedata consistently and confidently predictable.
Firstly, in the backtesting we use tick data, however when modeling order execution we do notconsider per-tick volumes coming from aggressive buyers and sellers (bid and ask). It might bethe case that for some ticks only aggressive buyers were present, and our algorithm executeda long limit order. This potentially leads to uncertainty in opening positions - in reality someof the orders may have not been filled exactly at the moment they are filled in the backtestingengine. Secondly, we do not model queues when placing limit orders, and, consequently cannotguarantee that our orders would have been filled every time if we submitted them live even ifboth bid and ask volumes were present in the tick. This is crucial for high frequency trading(HFT), where thousands of trades can be performed per day with tiny take-profits and stop-losses, but has much less influence for the trade intervals considered in the study. Finally, thereis an assumption that us entering the market does not change its state significantly. We believeit is a valid assumption for the considered financial instrument in case one trades one contractat a time.
We publically share models and code for analysing them. The exact code for feature extractionand labeling is not provided to avoid a rapid devaluation of the trading approach. Also, werelease only samples of the training data to comply with the requirements of the data licence.22 .4 Strategy
The objective of the study was not to provide a ready-to-trade strategy, but rather demostratea proof of concept, which we have successfully accomplished We believe that the demonstratedapproach can be used for any strategy trading the considered scenario. In terms of improvingthe strategy there is a couple of things can be done. For instance: take-profit and stop-lossoffsets might be linked to the volatility instead of having them constant. Also, flat strategiesusually work better in certain times of the day - it would be wise to interrupt trading beforeUSA and EU session starts and ends, as well as scheduled news and political events. Finally, allthe mentioned parameters we have chosen can be looked into and optimized to the needs of themarket participant.
In the current study we have proposed an approach for extracting certain scenarios from themarket and classifying them. The next step would be to propose a more holistic scenario ex-traction. For instance, one can notice that there are no price-related features in the classifieroutput. As the next step we would aim to define the scenarios using other market propertiesinstead of the price. Also, the approach can be validated for trading trends - in this case onewould aim to classify price level crossings with high precision. Finally, it would be interestingcomparing CatBoost classifier to DA-RNN [26] model as a recurrent neural network making useof the attention-based architecture designed on the basis of the recent break-through in the areaof natural language processing [27]. Finally, less conservative scoring metrics can be investigatedwhen fitting the classifier. For instance, F-beta score, which is a more general version of f1 score,allows to tweak the contribution of recall in the metric.
Our work showcased a end-to-end approach to perform automated trading using price extrema.Whilst extrema have been discussed as potentially high performance means for trading decisions,there has been no work proposing means to automatically extract them from data and createa successful strategy. Our work demonstrated an automated pipeline using this approach, andour evaluation showed some very promising results. Whilst we acknowledge that the resultsmay be skewed by some assumptions in the backtesting strategy, we still show high precisionand profitability. Furthermore this paper has presented every single aspect of data processing,feature extraction, feature evaluation and selection, machine learning estimator optimization andtraining, as well as details of the trading strategy. We hope that by proving every single stepof the ATP, it will enable further research in this area and be useful to a varied audience. Weconclude by providing samples of our code online at [2].
Acknowledgements
The authors would like to thank Roberto Metere for his drawing contributions.
References [1] Aitken, M., Frino, A.: The determinants of market bid ask spreads on the australian stockexchange: Cross-sectional analysis. Accounting & Finance (1), 51–63 (1996)232] Anonymous: Machine learning classification of price extrema based on mar-ket microstructure features. a case study of s&p 500 e-mini futures. (2020).https://doi.org/10.5281/ZENODO.4036850, https://zenodo.org/record/4036850 [3] Bauwens, L., Giot, P., Grammig, J., Veredas, D.: A comparison of financial duration modelsvia density forecasts. International Journal of Forecasting (4), 589–609 (2004)[4] Blessie, E.C., Karthikeyan, E.: Sigmis: A feature selection algorithm using correlation basedmethod. Journal of Algorithms & Computational Technology (3), 385–394 (2012)[5] Booth, A., Gerding, E., Mcgroarty, F.: Automated trading with performance weightedrandom forests and seasonality. Expert Systems with Applications (8), 3651–3661 (2014)[6] Caginalp, C., Caginalp, G.: Asset price volatility and price extrema. arXiv preprintarXiv:1802.04774 (2018)[7] Chen, T., Guestrin, C.: Xgboost: A scalable tree boosting system. In: Proceedings of the22nd acm sigkdd international conference on knowledge discovery and data mining. pp.785–794 (2016)[8] De Prado, M.L.: Advances in financial machine learning. John Wiley & Sons (2018)[9] Dempster, M.A., Leemans, V.: An automated fx trading system using adaptive reinforce-ment learning. Expert Systems with Applications (3), 543–552 (2006)[10] Du, P., Kibbe, W.A., Lin, S.M.: Improved peak detection in mass spectrum by incorporatingcontinuous wavelet transform-based pattern matching. Bioinformatics (17), 2059–2065(2006)[11] Dufour, A., Engle, R.F.: Time and the price impact of a trade. The Journal of Finance (6), 2467–2498 (2000)[12] Easley, D., de Prado, M.L., O’Hara, M., Zhang, Z.: Microstructure in the machine age.Available at SSRN 3345183 (2019)[13] Fan, F., Xiong, J., Wang, G.: On interpretability of artificial neural networks. arXiv preprintarXiv:2001.02522 (2020)[14] Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and anapplication to boosting. Journal of computer and system sciences (1), 119–139 (1997)[15] Grammig, J., Wellner, M.: Modeling the interdependence of volatility and inter-transactionduration processes. Journal of Econometrics (2), 369–400 (2002)[16] Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of machinelearning research (Mar), 1157–1182 (2003)[17] Huang, W., Nakamori, Y., Wang, S.Y.: Forecasting stock market movement direction withsupport vector machine. Computers & operations research (10), 2513–2522 (2005)[18] Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., Liu, T.Y.: Light-gbm: A highly efficient gradient boosting decision tree. In: Advances in neural informationprocessing systems. pp. 3146–3154 (2017)[19] Kissell, R.L.: The science of algorithmic trading and portfolio management. Academic Press(2013) 2420] Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: Ad-vances in neural information processing systems. pp. 4765–4774 (2017)[21] Manganelli, S.: Duration, volume and volatility impact of trades. Journal of Financialmarkets (4), 377–399 (2005)[22] Miller, N., Yang, Y., Sun, B., Zhang, G.: Identification of technical analysis patterns withsmoothing splines for bitcoin prices. Journal of Applied Statistics (12), 2289–2297 (2019)[23] M¨unnix, M.C., Shimada, T., Sch¨afer, R., Leyvraz, F., Seligman, T.H., Guhr, T., Stan-ley, H.E.: Identifying states of a financial market. Scientific Reports , 1–9 (2 2012).https://doi.org/10.1038/srep00644, http://arxiv.org/abs/1202.1623 [24] M¨unnix, M.C., Shimada, T., Sch¨afer, R., Leyvraz, F., Seligman, T.H., Guhr, T., Stanley,H.E.: Identifying states of a financial market. Scientific reports , 644 (2012)[25] Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A.V., Gulin, A.: Catboost: unbiasedboosting with categorical features. In: Advances in neural information processing systems.pp. 6638–6648 (2018)[26] Qin, Y., Song, D., Cheng, H., Cheng, W., Jiang, G., Cottrell, G.W.: A dual-stage attention-based recurrent neural network for time series prediction. IJCAI International Joint Confer-ence on Artificial Intelligence , 2627–2633 (2017). https://doi.org/10.24963/ijcai.2017/366[27] Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models areunsupervised multitask learners (2019)[28] Tabak, B.M., Feitosa, M.A.: An analysis of the yield spread as a predictor of inflationin brazil: Evidence from a wavelets approach. Expert Systems with Applications (3),7129–7134 (2009)[29] Zhou, B., Bau, D., Oliva, A., Torralba, A.: Interpreting deep visual representations vianetwork dissection. IEEE transactions on pattern analysis and machine intelligence41