[PDF] Machine Learning Classification of Price Extrema Based on Market Microstructure and Price Action Features. A Case Study of S&P500 E-mini Futures

Abstract

The study introduces an automated trading system for S\&P500 E-mini futures (ES) based on state-of-the-art machine learning. Concretely: we extract a set of scenarios from the tick market data to train the models and further use the predictions to statistically assess the soundness of the approach. We define the scenarios from the local extrema of the price action. Price extrema is a commonly traded pattern, however, to the best of our knowledge, there is no study presenting a pipeline for automated classification and profitability evaluation. Additionally, we evaluate the approach in the simulated trading environment on the historical data. Our study is filling this gap by presenting a broad evaluation of the approach supported by statistical tools which make it generalisable to unseen data and comparable to other approaches.

Full PDF

MMachine Learning Classiﬁcation of Price Extrema Based onMarket Microstructure Features

A Case Study of S&P500 E-mini Futures

Artur Sokolovsky ∗ Luca Arnaboldi † Newcastle University, School of ComputingNewcastle Upon Tyne, UK

Abstract

The study introduces an automated trading system for S&P500 E-mini futures (ES)based on state-of-the-art machine learning. Concretely: we extract a set of scenarios fromthe tick market data to train the model and further use the predictions to model trading. Wedeﬁne the scenarios from the local extrema of the price action. Price extrema is a commonlytraded pattern, however, to the best of our knowledge, there is no study presenting a pipelinefor automated classiﬁcation and proﬁtability evaluation. Our study is ﬁlling this gap bypresenting a broad evaluation of the approach showing the resulting average Sharpe ratio of6.32. However, we do not take into account order execution queues, which of course aﬀectthe result in the live-trading setting. The obtained performance results give us conﬁdencethat this approach is worthwhile.

As machine learning (ML) changes and takes over virtually every aspect of our lives, we are nowable to automate tasks that previously were only possible with human intervention. A ﬁeld inwhich it has quickly gained in traction and popularity is ﬁnance [12]. This ﬁeld, which is oftendominated by organisations with extreme expertise, knowledge and assets, is often consideredout of reach to individuals, due to the complex decision making and high risks. However, if onesets aside the ﬁnancial risks, the people, emotions, and the many other aspects involved, the coreprocess of trading can be simpliﬁed to decision making under pre-deﬁned rules and contexts,making it a perfect ML scenario.Most current day trading is done electronically, through various available applications. Mar-ket data is propagated by the trading exchanges and handled by specialised trading feeds tokeep track of trades, bids and asks by the participants of the exchange. Diﬀerent exchangesprovide data in diﬀerent formats following predetermined protocols and data structures. Finallythe dataset is relayed back to a trading algorithm or human to make trading decisions. Decisionsare then relayed back to the exchange, through a gateway, normally by means of a broker, whichinforms the exchange about the wish to buy or sell speciﬁc assets. This series of actions relies onthe understanding of a predetermined protocol which allows communication between the various ∗ Email: [email protected] , ORCID - 0000-0001-8080-1331 † Email: [email protected] , ORCID - 0000-0002-0808-2456 a r X i v : . [ q -f i n . T R ] S e p arties. Several software tools exist to ensure that almost all these steps are done for you, withthe decisions made being the single point that may be uniquely done by the individual. After amatch is made (either bid to ask or ask to bid) with another market participant, the match isconveyed back to the software platform and the transaction is completed. In this context, themain goal of ML is to automate the decision making in this pipeline.When constructing algorithmic trading software, or Automatic Trading Pipeline (ATP), eachof the components of the exchange protocol needs to be included. Speed is often a key factor inthese exchanges as a full round of the protocol may take as little as milliseconds. So to constructa robust ATP, time is an important factor. This extra layer adds further complexity to themachine learning problem. A diagram of what an ATP looks like in practice, is presented inFig 1. Feed HandlerExchange Electronic Trading PlatformMachine LearningHistoric Data Trading StrategyOrder entry Gateway ¢ market datapreprocess ASKBID order book decision livedata

Figure 1: Full overview of Automated Trading Platform componentsIn Fig 1 it can be observed that the main ML component is focused on the training of thedecision making and the strategy. This is by no means a straightforward feat as successful strate-gies are often jealously guarded secrets, as a consequence of potential ﬁnancial proﬁts. Severaldiﬀerent components are required, not the least analysing the market to establish componentsof interest. Historical RAW market data contains unstructured information, allowing one toreconstruct all the trading activity, however that is usually not enough to establish persistentprice-action patterns due to market non-stationarity. This characterisation is a complex process,which requires guidance and domain understanding. While traditional approaches have focusedon trying to learn from the full market proﬁle, over the whole year or potentially across dozens2f years, more recent work has proposed the usage of data manipulation to identify key eventsin the market [23], this advanced categorization can then become focus for the machine learninginput to improve performance.This methodology focuses on identifying states of a ﬁnancial market, which can then be usedto identify points of drastic change in the correlation structure, whether positive or negative.Previous approaches have used these states to correlate them to world wide events and generalmarket values to categorize interesting scenarios [5], showing that using these techniques thetraining of the strategy can be greatly optimized. In this work we build above these foundationsproposing an approach for the extraction and classiﬁcation of ﬁnancial market patterns based onprice action and market microstructure. We then implement a custom APT in which we showthat our approach is capable to generate consistent proﬁts with an average Sharpe ratio of 6.32for S&P500 E-mini futures as the asset of interest.The contributions of this paper are as follows: 1) a methodology to construct an automatedtrading platform using a state of the art machine learning techniques; 2) we present an auto-mated market proﬁling technique based on machine learning and 3) We propose and evaluate theperformance of a futures trading strategy based on market proﬁles - that is shown to performproﬁtably in an automated trading platform.The remainder of the paper is structured as follows: Section 2) will provide ﬁnancial andmachine learning background needed to understand our approach; Section 3) provides descrip-tions of some related work; Section 4) provides descriptions of the data; Section 5) will presentour methodology for the automated market proﬁling approach and assessment methodology;Section 6) will detail our constructed APT results; in Section 7) a discussion is provided aboutimplications and potential limitations of the study; and Section 8) will conclude the work.

There are several diﬀerent types of ﬁnancial data, and each of these have a diﬀerent role inﬁnancial trading. They are widely classiﬁed into four categories: i.

Fundamental Data , this kindof data is formed by a set of documents, for example ﬁnancial accounts, that a company has tosend to the organization that regulates its activities, this is most commonly accounting data ofthe business, ii.

Market Data , this constitutes all trading activities that take place, allowing youto reconstruct a trading book, iii.

Analytics , this is often derivative data acquired by analysingthe raw data to ﬁnd patterns, and can take the form of fundamental or market analytics, andiv.

Alternate data , this is extra domain knowledge that might help with the understanding ofthe other data, such as world events, social media, twitter and any other external sources. Inthis work we analyse type ii data to construct proﬁles of the market, and therefore focus ourbackground on this datatype, a more comprehensive review of diﬀerent data types is available inPrado & Lopez [8].

In order to prepare data for processing, the raw data is structured into predetermined formatsto make it easier for a machine learning algorithms to digest. There are several ways to groupdata, and various diﬀerent features may be aggregated. The main idea is to identify a windowof interest based on some heuristic, and then aggregate the features of that window to get arepresentation, called

Bar . Bars may contain several features and it is up to the individual todecide what features to select, common features include:

Bar start time , Bar end time , Sumof Volume , Open Price , Close Price , Min and

Max (usually called High and Low) prices, andany other features that might help characterise the trading performed within this window. The3ecision of how to select this window may be a make or break for your algorithm, as it will meanyou either have good useful data or data not representative of the market. An example of thiswould be the choice of using time as a metric for the bar window, e.g. take two hours snapshots.However, given the fact there are active and non-active trading periods, you might ﬁnd thatonly some bars are actually useful using this methodology. In practice, the widely consideredbest way to construct bars is based on the number of transactions that have taken place. Thisallows for the construction of informative bars which are independent of timings and get a goodsampling of the market, as it is done as a function of trading activity. There are of course manyother ways to select a bar [8], so it is up to the prospective user to select one that works forthem.

In mathematics an extremum is any point at which the value of a function is largest (a maximum)or smallest (a minimum). These can either be local or global extrema. A local extrema the valueis larger/lower at immediately adjacent points, while at an global extrema the value of thefunction is larger than its value at any other point in the interval of interest. If one wants tomaximise their proﬁts theoretically, their intent would be to identify an extrema and trade atthat point of optimality i.e. the peak,. This is generally speaking a non trivial problem, aschanges in the market may be random. As most strategies will perform several trades localextrema are sought out instead.As far as the algorithms for a ATP are concerned, they will often perform with active trading,so ﬁnding a global extremum serves little purpose. Consequently, local extrema within a pre-selected window are instead chosen. Several complex algorithms exist for this with use casesin many ﬁelds such as biology [10]. However, the objective is actually quite simple: identifya sample for which neighbours on each side have a lower amplitude for maxima, and higheramplitude for minima. This approach is very straight forward and can be implemented with alinear search. In the case where there are ﬂat peaks, which means several samples are of equalamplitude the middle sample is selected. Two further metrics of interest are, the prominenceand width of a peak. Prominence of a peak measures how much a peak stands out from thesurrounding baseline of the neear samples, and is deﬁned as the vertical distance between thepeak and lowest point. Width of the peak is the distance between each of the lower bounds ofthe peak, signifying the peaks duration. In case of peak classiﬁcation, these measures can aid amachine learning estimator to relate the obtained features with the discovered peaks, this avoidsattempts to directly relate properties of small peaks with large peaks and vice versa. These threemeasures allow for the classiﬁcation of good points of trading as well as giving insight as to whatled to this classiﬁcation with the prominence and width.

A market microstructure is the study of ﬁnancial markets and how they operate. It’s featuresrepresent the way that the market operates, how decisions are made about trades, price discoveryprocess and many more [19]. The process of market microstructure analysis is the identiﬁcationof why and how the market prices will change, in order to trade proﬁtably. These may include,1) the time between trades , as it is usually an indicator of trading intensity [3] 2) volatility ,which might represent evidence of good and bad trading scenarios, as high volatility may leadto unsuitable market state [15], 3) volume , which may directly correlate with trade duration,as it might represent informed trading rather than less high volume active trading [21], and 4) trade duration , high trading activity is related to greater price impact of trades and faster price4djustment to trade-related events, whilst slower trades may indicate informed single entities [11].Whist several other options are available they are often instrument related and require expertdomain knowledge. In general it is important to tailor and evaluate your features to cater thespeciﬁc scenario identiﬁed.One such important scenario to consider when catering to prices, is the aggressiveness ofbuyers and sellers. In an Order Book, a match implies a trade, which occurs whenever a bidmatches an ask and conversely, however the trade is only ever initiated by one party. In orderto dictate who is the aggressor is in this scenario, the tick rule is used [1]. The rule labels a buyinitiated trade as 1, and a sell-initiated trade as -1. The logic is the following an initial label l is assigned an arbitrary value of 1, if a trade occurs and the price change is positive, then l = 1if the price change is negative then l = 0 and if there is no price change l is inverted. This hasbeen shown to be able to identify the aggressor with high degree of accuracy. Machine learning is a ﬁeld that has come to take over almost every aspect of our lives, frompersonal voice assistants to healthcare. So it comes at no surprise that it is also gaining popularityin the ﬁeld of algorithmic trading. There is a wide range of techniques for machine learning,ranging from very simple such as regression, to techniques used for deep learning such as neuralnetworks. Consequently, its important to choose an algorithm which is most suited to the problemone wishes to tackle. In ATPs one of the possible roles of the machine learning is to identifysituations in which it is proﬁtable to trade, depending on the strategy, for example if using aﬂat strategy the intent is to identify when the market is ﬂat. In these circumstances, and due tothe potentially high ﬁnancial implications of false negatives understanding the prediction is key.Understanding the prediction involves the process of being able to understand why the algorithmmade this decision. This is a non trivial issue and something very diﬃcult to do for a wide rangeof techniques, neural networks being the prime example (although current advances are beingmade [29, 13] .Perhaps one of the simplest yet highly eﬀective techniques is known as Support Vector Ma-chines (SVM). SVMs are used to identify the hyperplane that best separates a binary sampling.If we imagine a set of points mapped onto a 2d plane, the SVM will ﬁnd the best line thatdivides the two diﬀerent classiﬁcations in half. This technique can easily be expanded to work onhigher-dimensional data, and since it is so simple, it becomes intuitive to see the reason behinda classiﬁcation. However, this technique whilst popular for usage in ﬁnancial forecasting [28, 17],suﬀers from the drawback that it is very sensitive to parameter tuning, making it harder touse, and also does not work with categorical features directly, making it less suited to complexanalysis.Another popular approach are decision trees. The reason tree-based approaches are hugelypopular is because they are directly interpretable. To improve the eﬃcacy of this techniqueseveral diﬀerent trees are trained and used in unison to come up with the result. The mostpopular case of this is random forests. Random forests operate by constructing a multitude ofdecision trees at training time and outputting the classiﬁcation of the individual trees. However,this suﬀers from the fact the the diﬀerent trees are not weighted and contribute equally, whichmight lead to inaccurate results. One class of algorithms which has seen mass popularity bothfor its robustness, eﬀectiveness and clarity is boosting algorithms. Boosters create “bins” ofclassiﬁcations that can be combined to reduce overﬁtting and improve the prediction. The datais split into N samples, either randomly or by some heuristic, and each tree is trained using oneof the samples. The results of each tree are then fed into the other trees to reduce overﬁttingand come to a combined result. Finally each tree features are internally evaluated leading to a5eakness measure which dictates the overall contribution to the result.The ﬁrst usage of boosting using a notion of weakness was introduced by AdaBoost [14],this work presented the concept of combining the output of the boosters into a weighted sumthat represents the ﬁnal output of the boosted classiﬁer. This allows for adaptive analysis assubsequent weak learners are tweaked in favor of those instances misclassiﬁed by previous clas-siﬁers. Following on from this technique two other techniques were introduced XGBoost [7] andLightGBM [18], both libraries gained a lot of traction in the machine learning community fortheir eﬃcacy, and are widely used. In this category, the most recent algorithm is Catboost.Catboost [25] highly eﬃcient and less prone to bias than its predecessors, it is quickly becomingone of the most used approaches, in part due to its high ﬂexibility. Catboost was speciﬁcallyproposed to expand issues in the previous approaches which lead to target leakage, which some-times led to overﬁtting. This was achieved by using ordered boosting, a new technique allowingindependent training and avoiding leakage. Also allowing for better performance on categoricalfeatures.

Feature analysis is the evaluation of the input data to assess their eﬀectiveness and contributionto the prediction. This may also take the form of creating new features using domain knowledgeto improve the data. The features in the data will directly inﬂuence the predictive models usedand the results that can be achieved. Intuitively the better the features that are prepared andchosen, the better the results that can be achieved. However, this may not always be the casefor every scenario, as it may lead to overﬁtting due to too large dimensionality of the data. Theprocess of evaluating and selecting the best features is referred to as feature engineering. Theﬁrst component of this process is feature importance evaluation. The simplest way to achievethis is by feature ranking [16], in essence a heuristic is chosen and each feature is assigned ascore based on this heuristic, ordering the features in descending order. This approach howevermay be problem speciﬁc and require previous domain knowledge. Another common approach isthe usage of correlations, to evaluate how the features relate to the output. The intent of thisapproach is to evaluate the dependency between the features and the result, which intuitivelymight result in a feature that contributes more to the output [4]. However these approachesevaluate the feature as a single component, in relation to the output, independent of the otherfeatures. Realistically one would want to understand their features as a whole and see how theycontribute to a prediction as a grouping.Beyond the initial understanding of the features it is important to get an understandingof the prediction. Compared to previously discussed approaches this starts from the resultof the model and goes back to the feature to see which ones contributed to the prediction.This has advantages over the pure feature analysis approaches as it can be applied to all thediﬀerent predictors individually and gives insights into the workings of the predictor. Recentadvances into this approach, namely SHAP (SHapleyAdditive exPlanations) [20], are able toprovide insight into a by prediction scoring of each feature. This innovative technique can allowthe step through assessment of features throughout the diﬀerent predictions, providing guidedinsight which can also be averaged for an overall assessment. This is very useful for debuggingan algorithm, assessing the features and understanding the market classiﬁcations, making itparticularly eﬀective for this case study. 6 .6 Trading strategy

In this context we only refer to common strategies for active trading. Active trading seeks togain proﬁt by exploiting price variations, to beat the market over short holding periods. Perhapsthe most common type of strategy is trend based strategy. These strategies aim to identify shiftsin the market towards a raise or decrease in price and sell at the point where they are likely togain proﬁt. The second very popular strategy is called ﬂat strategy. Unlike trending marketsa ﬂat market is a stable state in which the range for the broader market does not move eitherhigher or lower, but instead trades within the boundaries of recent highs and lows. This makes iteasier to understand changes of the market and make a proﬁt with a known market range. Therole of the machine learning in both these strategies is to predict whether the market is enteringa state of ﬂatness or trending respectively.

In order to test a trading strategy evaluation is performed to assess proﬁtability. Whilst it ispossible to do so on real market data, it is generally more favourable to do so on historical datato get a risk free estimation of performance. The notion is that a strategy that would haveworked poorly in the past will probably work poorly in the future, and conversely. But as youcan see, a key part of backtesting is the risky assumption that past performance predicts futureperformance.Several approaches exist to perform backtesting and diﬀerent things can be assessed. Beyondtesting of trading strategies backtesting can show how positions are opened and likelihoodsof certain scenarios taking place within a trading period. The more common technique is toimplement the backtesting within the trading platform, as this has the advantage that the samecode as live trading can be used. Almost all platforms allow for simulations on historical data,although it may diﬀer in form from the raw data one may have used for training. For moreﬂexibility one can implement their own backtrader system in languages such as python or R.This speciﬁc approach enables for the same code pipeline that is training the classiﬁer to alsotest the data, allowing for much smoother testing. Whilst this will ensure the same data that isused for training may be used for testing it may suﬀer from diﬀerences to the trading softwarethat might skew the results. Another limitation of this approach is that there is no connectionto the exchange or the broker, there will be limitations on how queues are implemented as wellas the simulating of latency which will be present during live trading. This means that theidentiﬁcation of slippages, which is the diﬀerence between where the trade is activated by thealgorithm and the entry and exit for a trade contrasting to when it actually entered the marketinto the order gateway, which will diﬀer and impact the order of trades.

In this section we break down related work in this area, including market characterization, priceextrema and optimal trading points, and automated trading systems. Each of the previous worksis compared to our approach.In their seminal work Munnix et al. [24] ﬁrst proposed the characterization of market struc-tures based on correlation. Through this they were able to detect key states of market crisesfrom raw market data. This same technique also allowed the mapping of drastic changes in themarket, which corresponded to key points of interest for trading. By using k means clustering,the authors were able to predict whether the market was approaching a crisis, allowing them toreact accordingly and construct a resilient strategy. Whilst this approach was a seminal work in7he understanding of market dynamics, it was still based on statistical dependencies and correla-tions which are not quite as advanced as more modern machine learning approaches. Nonethelesstheir successful results initiated a lot more research in this area. Their way of analysing a marketas a series of states proved to be a winning strategy allowing for more focused decision makingand bettering the understanding of the market. Following on from this same strategy we seekto characterise the market as a series of peaks of interest to understand whether the marketstructure ﬁts our desired trading window. This constitutes the initial stage of the ATP, or thepreprocessing, and with their approach several steps of manual intervention are still required.Historically there has been an intuition that the changes in market price are random. Bythis it is understood that whilst volatility is due to certain events, it is not possible to extractthem from raw data. However, volatility is still one of the core metrics for trading, despite thisassumed unpredictability. In an eﬀort to statistically analyse price changes and break down keyevents in the market Caginalp & Caginalp [6], propose a method to ﬁnd peaks in the volatility,representing price extrema. The price extrema represent the optimal point at which the priceis being traded before a large ﬂuctuation. This strategy depends on the exploitation of a shiftaway from the optimal point to either sell high or buy low. The authors describe the supplyand demand of a single asset as a stochastic equation where the peak is found when maximumvariance is achieved. This approach is heavily market speciﬁc and requires some assumptionsto hold true in order for a convergence on a price maxima. However the implied relationship ofsupply and demand is something that will hold true for any exchange making it a great ﬁt forvarious diﬀerent instruments. In a much diﬀerent context, Miller et al [22], analyse bitcoin datato ﬁnd proﬁtable trading bounds. Bitcoin, unlike more traditional exchanges, is decentralisedand traded 24h a day, making the data much more sparse and with less concentrated tradingperiods. This makes the trends much harder to analyse. Their approach manipulates the data insuch a way that it is smoothed, through the removal of splines, this seeks to manipulate curvesto make its points more closely related. By this technique they are able to remove outliers andﬁnd clearer points of ﬂuctuation as well as peaks. The authors then construct a bounded tradingstrategy which proves to perform quite well against unbounded strategies. Since bitcoin is moredecentralised and by the very nature of those investing in it, automated trading is much morecommon. This means that techniques to identify bounds and points of interest in the market arealso more favoured and widely used.An automated trading system, is a piece of code that autonomously trades in the market.The goal of such machine learning eﬀorts is the identiﬁcation of a market state in which a trade isproﬁtable, and to automatically perform the transaction at that stage. Such a system is normallyspecialised for a speciﬁc instrument, analysing unique patters to improve the characterisation.One such eﬀort focusing on FX markets is, Dempster & Leemans [9]. In this work, a techniqueusing reinforcement learning is proposed, to learn market behaviours. Reinforcement learningis an area of machine learning concerned with how software agents ought to take actions inan environment in order to maximize the notion of cumulative reward. This is achieved byassigning higher rewards to positive actions and negative rewards to negative actions, leadingto an optimization towards actions that increment rewards, in ﬁnancial markets this naturallycorresponds to proﬁtable trades. Using this approach the authors are able to characterise whento trade, performed analysis of associated risks, and automatically made decisions based on thesefactors. In more recent work, Booth et al. [5], describe a model for seasonal stock trading usinga ensemble of random forests. This leveraged variability in seasonal data to predict the pricereturn based on these events taking place. Their random forest based approach reduces thedrawn-down change of peak to through events. Their approach is based on domain knowledgeof well known seasonality events, usual approaches following this technique ﬁnd that whilst theevent is predictable, the volatility is not. So their characterisation allows to predict which events8ill lead to proﬁts. The random trees are used to characterise features of interest in a window oftime, and multiple of these are aggregated to inform the decision process. These ensembles arethen weighted based on how eﬀective they are and used to inform decision with higher weightshaving more input. Results across ﬁfteen DRX assets show increases in proﬁtability and inprediction precision.As can be seen statistical and machine learning techniques have been successfully appliedin a variety of scenarios proving eﬀective as the basis of automatic trading and identiﬁcationof proﬁtable events. Making the further investigation into more advanced machine learningtechniques a desirable and interesting area. Our work expands into these previous concepts toimprove performance and seek new ways to characterise market proﬁles.

In the study we use S&P500 E-mini futures contracts ESH2007, ESM2007, ES(H-Z)2017, ES(H-Z)2018 and ES(H-U)2019. Which correspond to ES futures contracts with expiration in March(H), June (M), September (U) and December (Z). Year 2007 data was used for preliminarytests and debugging. In our trading simulations we only consider the next expiring futurescontract with the conventionally accepted rollover dates - rolling to the next contract on thesecond Thursday of the expiration month. This decision ensures highest liquidity and, due tothe double-auction nature of the ﬁnancial markets, minimum bid-ask spreads. Since ES is a verypopular trading instrument, its bid-ask spreads are usually 1 tick, which however does not holdduring extraordinary market events, scheduled news, session starts and ends. Spreads should betaken into account during backtesting, which is described more in detail in the following section.

The data can be obtained from CME Group DataMine, Top-of-book dataset, containing all theelectronic trades and their volumes as well as best bid&ask prices and sizes. Alternatively, onecan collect it live from a trading platform or purchase from a third-party data feed. The datacan be available in at least two forms: market records and ticks (or lower resolution data). Ofcourse, ticks and lower resolution data can be obtained by aggregating market records.

In our study we demonstrate that out of the shelf machine learning methods can be successfullyapplied to ﬁnancial markets analysis and trading in particular. Taking into account the non-stationary nature of the ﬁnancial markets, we are achieving this goal by considering only a subsetof the time series. We propose a price-action-based way of deﬁning the subsets of interest andperform their classiﬁcation. Concretely: we identify local price extrema and predict whetherthe price will reverse (or ’rebound’) or continue its movement (also called ’crossing’). For thedemonstration purposes, we set up a simplistic trading strategy, where we are trading a pricereversal after a discovered local extremum is reached as shown in Fig. 4. The section is organizedin the following way:1. data pre-processing is described;2. detecting the price extrema;3. obtaining a set of features for each detected extremum;9. classiﬁcation of the extrema based on the obtained features. This step involves featureselection, model parameter optimization, training, testing and analyzing the model;5. setting up a simple trading strategy based on the obtained labels for test contract andperforming the backtesting.

In the current study we use tick data with bid and ask traded volumes which indicate aggressivesellers and buyers, respectively. Additionally, as complementary data, we use best bid-ask orderbook (OB) records (also called L1) to get basic OB per-tick statistics.We consider a tick to incorporate all the market events between the two price changes by aminimum price step. For the considered ﬁnancial instrument it is $0.25. When designing theper-tick features we pursue two goals: i) consider diﬀerent aspects of the market, and ii) limitthe number of features to avoid overﬁtting. The following features are collected on a per-tickbasis: • Volume (bid and ask); • Number of trades (bid and ask); • Number of Order Book (OB) changes (bid and ask); • Maximum OB size (bid and ask); • Minimum OB size (bid and ask); • Largest trade size (bid and ask); • Time stamp when the tick started; • Time stamp when the tick ended.From these we later construct the price-level features.

As we are aiming to trade price reversals from extrema, they should be detected ﬁrst. Whendetecting the extrema we use a sliding window approach on ticks with a window size of 500. Weintroduce limits to the peak widths - from 100 to 400. This serves three purposes: i) ensuresthat we do not consider high-frequency trading (HFT) scenarios which requires more modellingassumptions and diﬀerent backtesting engine; ii) allows to stay in intraday trading time framesand have a large enough number of trades for analysis; iii) makes the price level feature valuescomparable across all the entries.

To perform the extrema classiﬁcation, we obtain two types of features: i) designed from theprice level ticks (called price level (PL) features), and ii) obtained from the ticks right before theextremum is approached (called market shift (MS) features) as we illustrate in Fig. 2. We thinkit is essential to perform the two-step collection since the PL features contain properties of theextremum, and the MS features allow to spot any market changes happened between the timewhen the extremum was formed and the time we are trading it.10 rossingPrice LevelPL Features MS Features

Figure 2: The ﬁgure illustrates what data we use to design the two types of features: price level(PL) and market shift (MS) features.Considering diﬀerent extrema widths, is varying dimensionality of the data does not allow touse it directly for classiﬁcation - most of the algorithms take ﬁxed-dimensional vectors as input.We ensure the ﬁxed dimensionality of a classiﬁer input by aggregating per-tick features by price.We perform the aggregation for the price range of 10 ticks below (or above in case of a minimum)the extremum. This price range is ﬂexible - 10 ticks are often not available within ticks associatedwith the price level (red dashed rectangle in Fig. 2) in this case we ﬁll the empty price featureswith zeros. We assume that the further the price from the extremum the less information relevantfor the classiﬁcation it contains. Considering the intraday volatility of ES, we expect that theinformation beyond 10 ticks from the extremum is unlikely to improve the predictions. If oneconsiders larger time frames (peak widths), this number might need increasing.PL features are obtained from per-tick features by grouping by price with sum , max or count statistics. For instance: if one is considering volumes, it is reasonable to sum all the aggressivebuyers and sellers before comparing them. Of course, one can also compute mean or consider max and min volumes per tick. If following this line of reasoning, the feature space can beincreased to very large dimensions. We empirically choose a feature space described in Tab. 1.Here we shrink the feature space in order to make the feature selection step computationallyfeasible. As a result, the feature space is quite shallow, however suﬃcient to demonstrate thebeneﬁts based on the presented results.To track the market changes, for the MS feature set we use 237 and 21 ticks and comparestatistics obtained from these two periods. Non-round numbers help avoiding interference withthe majority of manual market participants who use round numbers [8].11able 1: Feature space used in the study. The features are obtained in two steps - after the pricelevel is formed, and right before it is approached. When discussed, features are referred by thecodes in the square brackets at the end of descriptions. Equation DescriptionPrice level (PL) features (cid:80) p = | PL − t | t ( V b + V a ) Bid and ask volumes summed across all the ticks at price P [PL0] (cid:80) p = | PL − t | t V b Bid volumes summed across all the ticks at price P [PL1] (cid:80) p = | PL − t | t V a Ask volumes summed across all the ticks at price P [PL2] (cid:80) p = | PL − t | t T b Number of bid trades summed across all the ticks at price P [PL3] (cid:80) p = | PL − t | t T a Number of ask trades summed across all the ticks at price P [PL4] (cid:80) p = | PL − t | t M ( O ) b Sum of maximum bid order book quotes across all the ticksat price P [PL5] (cid:80) p = | PL − t | t M ( O ) a Sum of maximum ask OB quotes across the ticks at price P [PL6] (cid:80) p = | PL − t | t

1, Number of ticks at price P [PL7] (cid:80) p = | PL − t | t V b (cid:80) p = | PL − t | t V a PL1 divided by PL2, at price P [PL8] (cid:80) p = | PL − t | t T b (cid:80) p = | PL − t | t T a Feature PL3 divided by feature PL4, at price P [PL9] (cid:80) p = | PL − t | t M ( O ) b (cid:80) p = | PL − t | t M ( O ) a Feature PL5 divided by feature PL6, at price P [PL10] (cid:80) p = | PL − t | t ( V b + V a ) (cid:80) p = | PL − t | t Total volume at price | P L − t | divided by the number of ticks [PL11] (cid:80) t =0 (cid:80) p = | PL − t | t V a Total Ask Volume [PL12] (cid:80) t =0 (cid:80) p = | PL − t | t V b Total Bid Volume [PL13] (cid:80) t =0 (cid:80) p = | PL − t | t T a Total Ask Trades [PL14] (cid:80) t =0 (cid:80) p = | PL − t | t T b Total Bid Trades [PL15] (cid:80) t =0 (cid:80) p = | PL − t | t ( V a + V b ) Total Volume [PL16]- Peak extremum - minimum or maximum [PL17]- Peak width in ticks described in the Background section [PL18]- Peak prominence - described in the Background section [PL19]- Peak width height - described in the Background section [PL20]Market shift (MS) features (cid:80) w =237 t,b ( V b ) (cid:80) w =237 t,a ( V a ) Fraction of bid over ask volume for last 237 ticks [MS0] (cid:80) w =237 t,b ( T b ) (cid:80) w =237 t,a ( T a ) Fraction of bid over ask trades for last 237 ticks [MS1] (cid:80) w =237 t V b (cid:80) w =237 t V a − (cid:80) w =21 t V b (cid:80) w =21 t V a Fraction of bid/ask volumes for long minus short periods [MS2] (cid:80) w =237 t T b (cid:80) w =237 t T a − (cid:80) w =21 t T b (cid:80) w =21 t T a Fraction of bid/ask trades for long minus short periods [MS3] (cid:80) w =237 t M ( O ) b (cid:80) w =237 t M ( O ) a − (cid:80) w =21 t M ( O ) b (cid:80) w =21 t M ( O ) a Fraction of sums of max OB bid/ask quotesfor long periods minus short periods [MS4]KeyOB - order book T - tradest - ticks N - total ticksp - price w - tick windowPL - price level price V - volumeb - bid P N - price level neighbours until distance Na - ask M(O) - Maximum value in order book

12e also choose the values to be comparable to our expected trading time frames. No opti-mization was made on them. We obtain the MS features being 2 ticks away from the price levelto ensure that our modelling does not lead to any time-related bias where one cannot physicallysend the order fast enough. When labeling the extrema as crossed or rebounded, we assumethat the level is crossed in case the price makes at least 3 ticks above (for a maximum, and belowfor a minimum) the level and rebounded in case there is a reversal and movement in the oppositedirection for 5-15 ticks. We report a range of experiments for diﬀerent rebounds.

In order to trade at the price level a price reversal (or rebound) must have been predicted at it.After the input features are designed and collected, a machine learning model is used to performthe classiﬁcation. For that purpose we choose CatBoost classiﬁer. We feel that CatBoost isa good ﬁt for the task since it is resistant to overﬁtting, stable in terms of parameter tuning,eﬃcient and one of the best-performing boosting algorithms. Finally, being based on decisiontrees, it is capable of correctly processing zero-padded feature values when no data at price isavailable. Other types of estimators might be comparable in one of the aspects and require muchmore focus in the other ones. For instance, neural networks might oﬀer a better performance,but are very demanding in terms of architecture and parameter optimization.We perform a commonly accepted in ML community set of steps of model classiﬁer optimiza-tion. It involves feature selection and model parameter tuning. In this study we use precision asa scoring function ( S ): S = T PT P + F P , (1)where TP is the number of true positives and FP the number of false positives. This was chosenover other metrics since in trading every FP might lead to losses, while false negative ( FN )means only a lost opportunity, but does not lead to any ﬁnancial loss. In order to avoid largebias in the base classiﬁer probability, at all the stages we balance the classes introducing classweights into the model.Firstly, we perform the feature selection step using a Recursive Feature Elimination withcross-validation (RFECV) method. The method is based on removing features from the modelone-by-one starting from the least important ones based on the model’s feature importance, andmeasuring the performance on a cross-validation dataset. This way we ensure an optimal subsetof features for the trading challenge where the objective is to maximize the fraction of proﬁtabletrades. Cross-validation allows to avoid overﬁtting by checking how the model performancegeneralizes into unseen data. Secondly, we optimize the parameters of the model in a grid-search fashion. For the parameter optimization we use a cross-validation dataset as well. Weperform training and cross-validation within a single contract and the backtesting of the strategyon the subsequent one to ensure relevance of the optimized model. For the test data we reportprecision (1) as the metric of interest for optimization. Considering take-proﬁt and stop-losssizes of the trading strategy, based on precision one can already see whether the approach ispotentially proﬁtable. We provide the line of reasoning of the model analysis on the example of the model trained onESM2019 and tested on ESU2019 contract for the best-performing rebound deﬁnition - it can be13pplied to the rest of the models in the same fashion using the provided code base and data The analysis is done on a test dataset using SHAP approach, where for each entry we obtainthe contributions of the feature values to the ﬁnal output. We use this approach to investigatehow the classiﬁcation decisions were made, spot interesting cases and see whether the decisionsagree with our expectations.

The trading strategy is deﬁned based on our deﬁnition of the crossed and rebounded price levels,and schematically illustrated in Fig. 3. It is a ﬂat market strategy, where we expect a pricereversal from the price level. Backtrader Python package is used for backtesting the strategy.Backtrader does not allow taking bid-ask spreads into account, that is why we are minimizing itsaﬀects by excluding HFT trading opportunities (by limiting peak widths) and limiting ourselvesto the actively traded contracts only. endstop loss: 3 tickstake profit: 15 ticksget input featuresclassifycrossing|reboundsubmit limit order at price levelprice level min|maxstart Tickshortlimitorder longlimitordercrossingrebound minmax 2 ticks from PLyes|noyes no Figure 3: The block diagram illustrates steps of the trading strategy. Anonymous. (2020, September 18). Machine Learning Classiﬁcation of Price Extrema Based onMarket Microstructure Features. A Case Study of S&P500 E-mini Futures. (Version 1.0.0). Zenodo.http://doi.org/10.5281/zenodo.4036851. Results

The results of each component of the methodology are presented separately in diﬀerent subsec-tions. An evaluation of these results is presented in Sec. 7.

The considered sample of the data is described in Tab. 2. We provide the numbers of ticks percontract without any ﬁltering. For the classiﬁcation we limit ourselves to the actively-tradedtimes. Additionally, we illustrate the numbers of entries in the classiﬁcation tasks in Fig. 5.Table 2: Numbers of reconstructed ticks per contract used in the study.

Contract Number of ticks

ESH2017 1271810ESM2017 1407792ESU2017 1243120ESZ2017 1137427ESH2018 2946336ESM2018 2919757ESU2018 1825417ESZ2018 3633969ESH2019 3066530ESM2019 2591000ESU2019 2537197

As an example of the peak detection we are showing a sample of data from March 14, 2007,ESH2007 in Fig. 4. We provide some basic relationships between the numbers of ticks, totalnumbers of price levels and numbers of rebounded price levels per contract in Fig. 5 to give someidea of the class imbalances and density of peaks per tick.Feature selection and model optimization are usually done sequentially and sometimes anumber of iterations can be made. In our preliminary tests on year 2007 data we observed thatshrinking the feature space often improves the model performance (not reported).Since CatBoost is quite stable with respect to the parameter tuning, we choose to performthe feature selection as the ﬁrst step and the model optimization as the second. For the featureselection step only class weights are set to ‘balanced’s in order to avoid large bias in the baseprobability, the rest of the parameters of the model are left default. Even though CatBoost hasa very wide range of parameters which can be optimized, we choose the most commonly tweakedparameters for the sake of feasibility of the optimization. The following parameters are optimized:1)

Number of iterations , 2)

Maximum depth of trees , 3) has time parameter set to True of False ,anmd 4) l2 regularisation . A full description of the parameter tuning is available in Tab. 3. Sincewe balance the class weights, when labeling the entries, we use a default conﬁdence (or classprobability) of 0.5 for the output. Additionally, for the potential future model improvements weillustrate the impact of the conﬁdence threshold in Fig. 6. The results obtained per contract areprovided in Fig. 7. 15igure 4: A sample of ESH2007 contract with peaks and peak widths annotated.Table 3:

Experiment Conﬁguration A.

Parameter optimization for the ﬁfteen tick rebounds.

Contract - represents the training data used,

Depth - maximum depth of the tree,

Has time -indicates whether temporal scale is used for training (always optimized to True - not presentedin the table), Iterations - number of training iterations, l2 leaf reg - L2 regularization factor ofthe cost function,

Learning rate - learning rate of the estimator.

Contract Depth Iterations l2 leaf reg Learning rate

ESH2007 6 100 4 0.30ESM2007 10 1000 7 0.03ESH2017 10 1000 1 0.03ESM2017 6 100 1 0.03ESU2017 5 1000 1 0.30ESZ2017 6 1000 1 0.03ESH2018 10 500 7 0.03ESM2018 5 1000 1 0.30ESU2018 10 500 1 0.03ESZ2018 10 500 1 0.03ESH2019 6 1000 1 0.03ESM2019 5 1000 1 0.3016

SH2017 ESU2017 ESH2018 ESU2018 ESH2019 ESU2019Contract025005000750010000125001500017500 N u m b e r o f r e b o un d s Price Levels TotalPrice Level Rebound 5Price Level Rebound 7Price Level Rebound 9Price Level Rebound 11Price Level Rebound 13Price Level Rebound 15 15000002000000250000030000003500000 N u m b e r o f t i c k s Ticks

Figure 5: The plot illustrates a basic relation between the numbers of ticks and numbers ofrebounds per contract as well as the overall number of price levels used in the study. P r e c i s i o n N u m b e r o f e n t r i e s Figure 6: Test precision of the model trained on ESM2019, 15 ticks rebound with a varyingconﬁdence threshold.

In Tab. 1 we explain all the features chosen for the model during the RFECV feature selectionstep. There are two points at which the features are collected - right after the price level is17 S H E S H E S U E S H E S U E S H Training Data Contract0.1750.2000.2250.2500.2750.3000.3250.3500.375 P r e c i s i o n Rebound 5Rebound 7Rebound 9Rebound 11Rebound 13Rebound 15

Figure 7: Per contract classiﬁcation precision measured on the test data (the following contractafter training). One can see that the rebound size does not change the performance much. Atthe same time there is a trend towards performance decrease with time, which is expected.formed, and right before it is approached for crossing or rebounding. Firstly, we have obtainedSHAP values for the model and plotted it in Fig. 8 to compare feature contributions. One cansee that all 7 features have comparable contributions. The closest to linear relation betweenthe feature values and the contribution is observed for MS1, where large feature values oftencontribute to a positive label output and vice versa.We can observe there is no single feature having a large contribution towards a positive labelmeaning that either detecting rebounds is a more complex task requiring smaller impacts frommultiple features or it is a result of penalization only false positives in the feature selection andparameter optimization phases. The positive contribution towards the misclassiﬁed positivelylabeled entry is an outlier point on the right in Fig. 8, feature PL8.We also wanted to investigate prediction paths with the largest feature contributions. Forthat we have taken the top 25 entries with the largest impact among all the features - the resultare provided in Fig. 9 (a). Most of the paths end up at a conﬁdent negative output (correspondsto ’crossing’). In most of the cases the largest contribution comes from PL8 and PL11. Inthe cases where the model is uncertain i.e. outliers crossing the base probability line (grey)after PL11, one can see that the MS features contribute signiﬁcantly to the output. This is anindication of the changed market situation, where the model comes up with a positive label basedon the price-level-formation features and then changes the prediction right before the price levelis approached. It supports our hypothesis that MS features indicate the most recent changes ofthe market, which can signiﬁcantly impact the output of the model. Surprisingly, there is onlyone positively labeled entry and it is misclassiﬁed. It can be concluded that the model is moreoften conﬁdent about the negative labels rather than the positive.To further gain understanding of the prediction paths, we have taken the same top 25 entries18

SHAP value (impact on model output)MS3MS2MS1PL1PL11, t=2PL11, t=0PL8, t=1

LowHigh F e a t u r e v a l u e Figure 8: The ﬁgure illustrates the feature contributions to the output on a per-entry basis. Xaxis shows the strength of the contribution either towards a positive label (when >

0) or towardsthe negative one. Colors indicate the feature value - blue corresponds to small values and red -to large values.from a random sample of 200 entries - the result is shown in Fig. 9 (b). Here we saw more entrieswhere the model was not conﬁdent. Also, there are more positively labeled misclassiﬁed entries.One can see a common prediction path for feature PL8, where it pushes the probability above0.5 but then there is often a huge impact towards the opposite direction from PL11. Also, onecan see that diﬀerent oﬀsets ( t ’s) of PL11 often contribute in the same direction. There is acouple of misclassiﬁed entries, where these two features contributed diﬀerently - as the futurework, it might be worth investigating detection of potentially alarming feature behaviour. When backtesting the strategy we enter the market at the price level with a limit order with a 3ticks stop-loss and 5-15 ticks take-proﬁt depending on the experiment conﬁguration (Fig. 3. Theresulting proﬁt curves and Sharpe ratios are shown in Fig. 10.When computing the net outcomes of the trades, we add $4.2 per-contract trading costs, basedon our assessment of current lowest broker fees. We do not set any slippage in the backtestingengine, since ES liquidity is large. However, we execute the stop-losses and take-proﬁts on thetick following the close-position signal to account for order execution delays and slippages at thesame time. This allows to take into account large volatilities and gaps happening on marketevents, which might work in both directions, if the position is closed with a market order. Thebacktesting done on tick data therefore there are no bar-backtesting assumptions in the result.We provide the trades statistics for the model trained on ESM2019 and tested on ESU2019, 15ticks rebound in Tab. 4. 19 .00.0 0.20.2 0.40.4 0.60.6 0.80.8 1.01.0

Model output valuePL8, t=1PL11, t=0PL11, t=2PL1MS1MS2MS3 (a) Full Dataset

Model output valuePL8, t=1PL11, t=0PL11, t=2PL1MS1MS2MS3 (b) 200 Random Samples

Figure 9: Top 25 entries by feature contribution strength from the full dataset (a) and 200 entriesrandom sample of the test dataset (b). The ﬁgure should be viewed from the bottom to the top,where each dashed horizontal line accounts for the feature on the left and curved lines of of colorsbetween blue and red correspond to classiﬁcation cases. They root from the base probability atthe bottom and approach the output probability at the top. Misclassiﬁed entries are drawn withdashed lines. 20 an Jan Apr Jul Oct Apr Jul2018 2019 (a) Sharpe ratios

Jan Jan Jan May Sep May Sep May Sep2017 2018 2019 (b) Proﬁt curves

Figure 10: Proﬁt curves for all the rebound conﬁgurations for years 2017-2019 with correspondingannual rolling Sharpe Values (computed for zero risk-free income).Table 4:

Trades Statistics . Results for Experiment Conﬁguration A (Tab. 3). The results areprovided for the maximum position size of 1 contract.

Winners Losers All

Total Trades (Rounds) 1570 4808 6378Total Commission [$] 6594 20193.6 26787.6Max NET Proﬁt (Loss) [$] 683.3 -4.2 683.3Min NET Proﬁt (Loss) [$] 8.3 -741.7 -741.7Total NET Proﬁt (Loss) [$] 256568 -164569 91999Avg. NET proﬁt per trade 163.419 -34.2281 14.4246Longest Trade 3 days 01:00:01 2 days 01:28:45 3 days 01:00:01Total Time in the market 51 days 19:36:38 16 days 23:59:30 68 days 19:36:08Avg. Time in a trade 0 days 00:47:31 0 days 00:05:05 0 days 00:15:32Trades longer 1h 233 26 259% of Time in the market 0.068% of Proﬁtable Trades 0.25STD of daily return [$] 320Avg. daily return [$] 153Max Drawdown [$] 2209Avg. Conseq. NO. of Winners 1.41Max Conseq. NO. of Winners 5Avg. Conseq. NO. of Losers 4.31Max Conseq. NO. of Losers 25

This section breaks down and analyses the results presented in Sec. 6. Firstly results are discussedin relation to performance, proﬁtability as well as the eﬀectiveness of the machine learningapproach. Secondly. we discuss several limitations in regards to our results and how they can beaddressed. Thirdly, the trading strategy is discussed, although it was a rather simple strategyto showcase the approach, we discuss proﬁtability and potential extensions. Finally we present21ur intuition of potential advancements and future work for this area.

One can see that the number of price levels closely follows the numbers of ticks per contract(Fig. 5). Since the number of peaks is proportionally related to the number of ticks, the densityof the peaks considered in the study does not change with time. Consequently, the peak patternwe are interested in can be considered relatively stationary, not disappearing in various marketconditions, but rather adjusting its time horizon to the market.Interestingly, we observed that the rebound size does not change the performance of themodel signiﬁcantly (Fig. 7). After the feature selection and parameter tuning steps, the com-pared models have diﬀerent parameters and operate on diﬀerent feature subsets, consequently,performance diﬀerences withing a single contract might have diﬀerent sources. At the same time,the relative performance across contracts clearly follows the same pattern for at least 5 contracts- from ESZ2017 to ESZ2018. It might be the case that the used features spot tend to workbetter for a particular market and the prevalence of this market state deﬁnes the success of theclassiﬁcation. Moreover, larger rebounds do not necessarily lead to lower precision.When studying the model conﬁdence vs precision (Fig. 6) we ﬁnd that the number of entriesdecreases rapidly, and a conﬁdence threshold above 0.55 becomes impractical to use due to a lownumber of entries classiﬁed as rebounds. At the same time, based on this plot we can concludethat optimization of the threshold does not lead to signiﬁcant performance change. Taking intoaccount ﬁnancial markets non-stationarity this observation is expected. Our attempts to get arelatively stationary pattern in the data worked to some extent but, of course, do not make thedata consistently and conﬁdently predictable.

Firstly, in the backtesting we use tick data, however when modeling order execution we do notconsider per-tick volumes coming from aggressive buyers and sellers (bid and ask). It might bethe case that for some ticks only aggressive buyers were present, and our algorithm executeda long limit order. This potentially leads to uncertainty in opening positions - in reality someof the orders may have not been ﬁlled exactly at the moment they are ﬁlled in the backtestingengine. Secondly, we do not model queues when placing limit orders, and, consequently cannotguarantee that our orders would have been ﬁlled every time if we submitted them live even ifboth bid and ask volumes were present in the tick. This is crucial for high frequency trading(HFT), where thousands of trades can be performed per day with tiny take-proﬁts and stop-losses, but has much less inﬂuence for the trade intervals considered in the study. Finally, thereis an assumption that us entering the market does not change its state signiﬁcantly. We believeit is a valid assumption for the considered ﬁnancial instrument in case one trades one contractat a time.

We publically share models and code for analysing them. The exact code for feature extractionand labeling is not provided to avoid a rapid devaluation of the trading approach. Also, werelease only samples of the training data to comply with the requirements of the data licence.22 .4 Strategy

The objective of the study was not to provide a ready-to-trade strategy, but rather demostratea proof of concept, which we have successfully accomplished We believe that the demonstratedapproach can be used for any strategy trading the considered scenario. In terms of improvingthe strategy there is a couple of things can be done. For instance: take-proﬁt and stop-lossoﬀsets might be linked to the volatility instead of having them constant. Also, ﬂat strategiesusually work better in certain times of the day - it would be wise to interrupt trading beforeUSA and EU session starts and ends, as well as scheduled news and political events. Finally, allthe mentioned parameters we have chosen can be looked into and optimized to the needs of themarket participant.

In the current study we have proposed an approach for extracting certain scenarios from themarket and classifying them. The next step would be to propose a more holistic scenario ex-traction. For instance, one can notice that there are no price-related features in the classiﬁeroutput. As the next step we would aim to deﬁne the scenarios using other market propertiesinstead of the price. Also, the approach can be validated for trading trends - in this case onewould aim to classify price level crossings with high precision. Finally, it would be interestingcomparing CatBoost classiﬁer to DA-RNN [26] model as a recurrent neural network making useof the attention-based architecture designed on the basis of the recent break-through in the areaof natural language processing [27]. Finally, less conservative scoring metrics can be investigatedwhen ﬁtting the classiﬁer. For instance, F-beta score, which is a more general version of f1 score,allows to tweak the contribution of recall in the metric.

Our work showcased a end-to-end approach to perform automated trading using price extrema.Whilst extrema have been discussed as potentially high performance means for trading decisions,there has been no work proposing means to automatically extract them from data and createa successful strategy. Our work demonstrated an automated pipeline using this approach, andour evaluation showed some very promising results. Whilst we acknowledge that the resultsmay be skewed by some assumptions in the backtesting strategy, we still show high precisionand proﬁtability. Furthermore this paper has presented every single aspect of data processing,feature extraction, feature evaluation and selection, machine learning estimator optimization andtraining, as well as details of the trading strategy. We hope that by proving every single stepof the ATP, it will enable further research in this area and be useful to a varied audience. Weconclude by providing samples of our code online at [2].

Acknowledgements

The authors would like to thank Roberto Metere for his drawing contributions.

References [1] Aitken, M., Frino, A.: The determinants of market bid ask spreads on the australian stockexchange: Cross-sectional analysis. Accounting & Finance (1), 51–63 (1996)232] Anonymous: Machine learning classiﬁcation of price extrema based on mar-ket microstructure features. a case study of s&p 500 e-mini futures. (2020).https://doi.org/10.5281/ZENODO.4036850, https://zenodo.org/record/4036850 [3] Bauwens, L., Giot, P., Grammig, J., Veredas, D.: A comparison of ﬁnancial duration modelsvia density forecasts. International Journal of Forecasting (4), 589–609 (2004)[4] Blessie, E.C., Karthikeyan, E.: Sigmis: A feature selection algorithm using correlation basedmethod. Journal of Algorithms & Computational Technology (3), 385–394 (2012)[5] Booth, A., Gerding, E., Mcgroarty, F.: Automated trading with performance weightedrandom forests and seasonality. Expert Systems with Applications (8), 3651–3661 (2014)[6] Caginalp, C., Caginalp, G.: Asset price volatility and price extrema. arXiv preprintarXiv:1802.04774 (2018)[7] Chen, T., Guestrin, C.: Xgboost: A scalable tree boosting system. In: Proceedings of the22nd acm sigkdd international conference on knowledge discovery and data mining. pp.785–794 (2016)[8] De Prado, M.L.: Advances in ﬁnancial machine learning. John Wiley & Sons (2018)[9] Dempster, M.A., Leemans, V.: An automated fx trading system using adaptive reinforce-ment learning. Expert Systems with Applications (3), 543–552 (2006)[10] Du, P., Kibbe, W.A., Lin, S.M.: Improved peak detection in mass spectrum by incorporatingcontinuous wavelet transform-based pattern matching. Bioinformatics (17), 2059–2065(2006)[11] Dufour, A., Engle, R.F.: Time and the price impact of a trade. The Journal of Finance (6), 2467–2498 (2000)[12] Easley, D., de Prado, M.L., O’Hara, M., Zhang, Z.: Microstructure in the machine age.Available at SSRN 3345183 (2019)[13] Fan, F., Xiong, J., Wang, G.: On interpretability of artiﬁcial neural networks. arXiv preprintarXiv:2001.02522 (2020)[14] Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and anapplication to boosting. Journal of computer and system sciences (1), 119–139 (1997)[15] Grammig, J., Wellner, M.: Modeling the interdependence of volatility and inter-transactionduration processes. Journal of Econometrics (2), 369–400 (2002)[16] Guyon, I., Elisseeﬀ, A.: An introduction to variable and feature selection. Journal of machinelearning research (Mar), 1157–1182 (2003)[17] Huang, W., Nakamori, Y., Wang, S.Y.: Forecasting stock market movement direction withsupport vector machine. Computers & operations research (10), 2513–2522 (2005)[18] Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., Liu, T.Y.: Light-gbm: A highly eﬃcient gradient boosting decision tree. In: Advances in neural informationprocessing systems. pp. 3146–3154 (2017)[19] Kissell, R.L.: The science of algorithmic trading and portfolio management. Academic Press(2013) 2420] Lundberg, S.M., Lee, S.I.: A uniﬁed approach to interpreting model predictions. In: Ad-vances in neural information processing systems. pp. 4765–4774 (2017)[21] Manganelli, S.: Duration, volume and volatility impact of trades. Journal of Financialmarkets (4), 377–399 (2005)[22] Miller, N., Yang, Y., Sun, B., Zhang, G.: Identiﬁcation of technical analysis patterns withsmoothing splines for bitcoin prices. Journal of Applied Statistics (12), 2289–2297 (2019)[23] M¨unnix, M.C., Shimada, T., Sch¨afer, R., Leyvraz, F., Seligman, T.H., Guhr, T., Stan-ley, H.E.: Identifying states of a ﬁnancial market. Scientiﬁc Reports , 1–9 (2 2012).https://doi.org/10.1038/srep00644, http://arxiv.org/abs/1202.1623 [24] M¨unnix, M.C., Shimada, T., Sch¨afer, R., Leyvraz, F., Seligman, T.H., Guhr, T., Stanley,H.E.: Identifying states of a ﬁnancial market. Scientiﬁc reports , 644 (2012)[25] Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A.V., Gulin, A.: Catboost: unbiasedboosting with categorical features. In: Advances in neural information processing systems.pp. 6638–6648 (2018)[26] Qin, Y., Song, D., Cheng, H., Cheng, W., Jiang, G., Cottrell, G.W.: A dual-stage attention-based recurrent neural network for time series prediction. IJCAI International Joint Confer-ence on Artiﬁcial Intelligence , 2627–2633 (2017). https://doi.org/10.24963/ijcai.2017/366[27] Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models areunsupervised multitask learners (2019)[28] Tabak, B.M., Feitosa, M.A.: An analysis of the yield spread as a predictor of inﬂationin brazil: Evidence from a wavelets approach. Expert Systems with Applications (3),7129–7134 (2009)[29] Zhou, B., Bau, D., Oliva, A., Torralba, A.: Interpreting deep visual representations vianetwork dissection. IEEE transactions on pattern analysis and machine intelligence41