A Machine Learning-based Recommendation System for Swaptions Strategies
AA Machine Learning-based RecommendationSystem for Swaptions Strategies
Adriano Soares Koshiyama, Nick Firoozye and Philip Treleaven
Department of Computer ScienceUniversity College London
London, United [email protected] 5, 2018
Abstract
Derivative traders are usually required to scan through hundreds, eventhousands of possible trades on a daily basis. Up to now, not a single so-lution is available to aid in their job. Hence, this work aims to developa trading recommendation system, and apply this system to the so-calledMid-Curve Calendar Spread (MCCS), an exotic swaption-based deriva-tives package. In summary, our trading recommendation system followsthis pipeline: (i) on a certain trade date, we compute metrics and sensi-tivities related to an MCCS; (ii) these metrics are feed in a model that canpredict its expected return for a given holding period; and after repeating(i) and (ii) for all trades we (iii) rank the trades using some dominancecriteria. To suggest that such approach is feasible, we used a list of 35different types of MCCS; a total of 11 predictive models; and 4 benchmarkmodels. Our results suggest that in general linear regression with lassoregularisation compared favourably to other approaches from a predictiveand interpretability perspective.
Derivative traders are usually required to scan through hundreds, even thou-sands of possible trades on a daily basis. A concrete case is the so-called Mid-Curve Calendar Spread (MCCS), a derivatives package that involves selling anoption on a forward-starting swap and buying an option on a spot-starting swapwith longer expiration [8, 26]. In such a package, traders look for the historicalcarry and the breakeven width levels, metrics that can be easily inferred fromthe terminal or aged payoff profile of the MCCS, shown in several heatmapsmade by the research team. After that, they rank the most prominent ones tooffer a client or to proceed in some proprietary trading. In general, the straight-forwardness and swiftness that the decisions are made is the main upside of thisframework.However, one might notice that the main downsides of such approach are:(i) substantial information on the underlying like sensitivities, implied volatility,etc. are usually not taken into account; (ii) using the previous example, high1 a r X i v : . [ q -f i n . P M ] O c t istorical values for carry and breakeven widths are more necessary rather thansufficient conditions for a profitable MCCS trade, being such argument exten-sible to other trades as well; (iii) a trader can quickly judge if an individualtrade is worthwhile to invest, but may take some time to find it; and (iv) aftera given period, traders tends to only look at a small subset of possible trades(small area on the heatmap), rather than the all available selection. Hence, asystematic approach where more information at hand is crossed and aggregatedto find good trading picks and undoubtedly increase the trader’s productivity.Therefore, the objective of this work is to develop a trading recommendationsystem that can aid derivatives traders in their day-to-day routine. Being morespecific, our solution is based on the following pipeline: (i) on a certain tradedate, we compute metrics and sensitivities related to MCCS; (ii) these metricsare feed in a model that can predict its expected return for a given holdingperiod; and after repeating (i) and (ii) for all trades we (iii) rank the tradesusing some dominance criteria. Our final solution is a model-based heatmapwith the attractiveness scores for each trade, which can be offered to the tradersand salespeople on a daily basis.In this sense, we organised this work as follows: next section presents aliterature review on existing approaches to return/price prediction/estimationin different areas and instruments, as well as a brief description on MCCS trades.The third section displays the dataset that comports the MCCS trades, showinghow the information is computed and gathered, which variables are the inputand outputs, and the main assumptions that are embedded in it. Then, we moveto modelling strategy, highlighting the main models that are going to be used ascandidates for the recommendation system, how they are tested and have theirperformance assessed. Finally, we exhibit the results and discussions, closingthis work with some concluding remarks and future directions for research. Literature provides a growing body of evidence that price changes can be pre-dicted, that is, in particular circumstances and periods securities violate theEfficient Market Hypothesis [5, 24]. In this sense, researchers have employeddifferent modelling approaches and information sets to predict price changesacross a range of assets. When we scan the literature for cash instruments (eq-uities, bonds, foreign exchange, etc.) focused only in using past returns as themain source for prediction, we can find works that tap into Bayesian forecast-ing [33], Nonparametric Predictive Inference [2], Forecasting Combination [12],Generalized Exponential Weighted Moving Average [25], Support Vector Ma-chines (SVM) [22], Shallow and Deep Neural Networks architectures [7,9,18,32],Random Forest and Gradient Boosting Trees [23], and so forth. The list of pro-posed methodologies keeps growing, in which equities or indices appears as thedominant asset class to apply these algorithms. Collectively, they provide evi-dence that some forecastability over returns can be achieved by putting in placecomplex models with a suitable training scheme.Contrasting with the emphasis that researchers in cash instruments put onreturn predictability, when we devote our attention to research in derivatives2nstruments (options, swaps, swaptions, etc.) it is clear that most of the effort isconcentrated on pricing these contracts. In parallel to the traditional framework,alternative ways of pricing and trading started to emanate relying on fewerassumptions and more data-driven. We can pinpoint approaches that use NeuralNetworks for option pricing and hedging with daily S&P 500 index daily calloptions [17] as well as for real-time pricing and hedging options on currencyfutures of EUR/USD at tick level [31]. It is worth to mention other approachesin the derivatives realm, such that the prediction of pricing and hedging errorsfor equity-linked warrants with Gaussian Process models [19], building machinelearning models for predicting option prices over KOSPI 200 Index options [27]and a general study on forecasting option price distributions using Bayesiankernel methods [28].When we devote our attention to the asset type that this work is dedi-cated, interest rate swaptions, a similar pattern persists: most of the researchis related to pricing and not to return prediction. Regarding pricing, the sametradition of relying on stochastic calculus techniques is followed [4, 29]. Regard-ing potential alternatives using more data-driven approaches as we saw withcurrency, indices and equities options, we can only mention the work of Souzaet al. [30] which calibrates the Vasicek interest rate model under the risk neu-tral measure by learning the model parameters using Gaussian processes forregression. Considering trading strategies and return prediction, we can findeven less academic research, being perhaps most of the research residing insidethe counterparts that exchange such products (banks, hedge funds, etc.). Thisshortage of published research might be linked with the absence of ready to useand publicly available datasets, similar to the ones found in cash products sincethese instruments are traded off-exchange.Based on this review of existing approaches to return/price prediction/estimationin different areas and instruments, to the best of our knowledge, our work isthe first attempt to build a trading recommendation system in the context ofderivatives. Our approach is not the only novel from a modelling perspective,but instead of trading the vanilla product (receiver/payer interest rate swap-tion), we prefer to focus on options strategies (calendar spreads, straddles, etc.)which in many cases is the package that is in practice traded. By thinking interms of the package, in this case, a Mid-Curve Calendar Spread, rather thanthe individual constituents we unlock some features that can only be computedin this situation, like the carry at expiry, breakeven width and so on.Therefore, we can train our models not only using past returns but also usingsensitivities as well as information derived from the package payoff function. Byportraying in this manner our investment strategy, we have a large informationset that can substantially add information to aid forecasting returns. But as acounter-effect, this poses a new challenge on separating relevant features in adynamic context. In this respect, the combination of temporal cross-validation,a diverse set of models and regularisation/feature selection can provide a ro-bust framework for trading strategies backtesting and assessment. But beforepresenting such framework, next section gives a brief view on MCCSs trades.
Mid-Curve Calendar Spread (MCCS) is a package involving short selling anoption on a forward-starting swap and going long a longer-expiry swaption on3he same underlying swap [8]. There is a counterpart with many similaritiesfor equities – check in [26] for more information. Investors typically use MCCSto take a view on forwarding volatility. This comes from the fact that, concep-tually, spot volatility can be decomposed into forward volatility and mid-curvevolatility. Taking 10y10y for example, Figure 1 illustrate the time periods cov-ered by different interest rate volatilities and their instruments. The red linesindicate the time over which interest rate volatility exposure is taken, and thegrey line indicates the underlying forward swap rate.Figure 1: Mid-curve Swaption: 5y mid-curve on 5y10y swap rate – the volatilityof a forward-starting swaption, called mid-curve, whose strike is set at inceptionand but the underlying swap starts several years following the option expirydate.Figure 2 presents the payoff profile for an EUR 1m1y2y .Figure 2: Payoff profile for an EUR 1m1y2y.We plot the payoff profiles for current volatility and up and down volatilityscenarios, noting that the long vega position means that the payoff profile shiftsup in a rising volatility environment and correspondingly shifts down in a fallingvol environment. We calculate the (volatility adjusted) breakevens as being0 . − . ± . − . This notation is extensively used during this work. In this case, the first 10y means aspot swaption with 10 year of expiration, while the second 10y refers to the swap tenure. Short selling a 1m1y2y mid-curve swaption and going long a longer-expiry 13m2y spotswaption.
In summary, our solution develops the following roadmap (also schematicallydescribed in Figure 3):Figure 3: Flowchart describing the input-output schemes from the proposedtrading recommendation system for MCCS trades.1.
Data : On a certain trade date, we calculate metrics and sensitivities related to an MCCS package;2.
Modelling : These metrics are feed in a predictive model that outputsits expected return for a given holding period (e.g., one year);3.
Recommendation : After repeating (i) and (ii) for all MCCS we (iii)rank them based on the expected returns using some criteria.Following this outlined structure, the next three subsections describe in moredetails when and which MCCS trades were recorded (Dataset), which predic-tive models were trained and how they were assessed (Modelling) and how thelong/short trading signal is computed for each MCCS (Recommendation). Fi-nally, last subsection presents which metrics were used to evaluate the recom-mendation system performance when a certain predictive model candidate isunderpinning it.
During our experiments, we opted to use the trades displayed in Table 1. Al-though many other configurations are available in practice, these are the oneswith longest historical data available, which is important when it is necessary5able 1: Configuration of the MCCS trades used.
Currency Expiry Forward Swap Currency Expiry Forward SwapEUR 1y 1y 1y EUR 3y 3y 2yEUR 1y 1y 4y EUR 3y 4y 1yEUR 1y 2y 3y EUR 3y 5y 5yEUR 1y 2y 8y EUR 4y 1y 1yEUR 1y 3y 2y EUR 4y 1y 4yEUR 1y 4y 1y EUR 4y 2y 3yEUR 1y 5y 5y EUR 4y 2y 8yEUR 2y 1y 1y EUR 4y 3y 2yEUR 2y 1y 4y EUR 4y 4y 1yEUR 2y 2y 3y EUR 4y 5y 5yEUR 2y 2y 8y EUR 5y 1y 1yEUR 2y 3y 2y EUR 5y 1y 4yEUR 2y 4y 1y EUR 5y 2y 3yEUR 2y 5y 5y EUR 5y 2y 8yEUR 3y 1y 1y EUR 5y 3y 2yEUR 3y 1y 4y EUR 5y 4y 1yEUR 3y 2y 3y EUR 5y 5y 5yEUR 3y 2y 8y to fit a predictive model. As it can be seen, all trades are in Euro, ranging fromdifferent expiries (1y-5y), forwards (1y-5y) and swap tenures (1y-5y and 8y).For each configuration, at time t we agree with a counterpart to trade thispackage using the At the Money Forward (ATMF) rate as the strike, payingor receiving the present value P V t . The P V t is computed via SABR model[29], using information and parameters (e.g., spot, forward rates and rate-ratecorrelation) calibrated using market data on a daily basis. From the same modelthat computed the P V t , we can also obtain other metrics and sensitivities asthose displayed in Table 2.Table 2: Metrics and sensitivities computed for each available package at time t . Features
P V
StrikeCarry at Expiry (Carry) Breakeven Width (BE Width)Aged 1y Carry ThetaATMF Implied Volatility (Implied Vol) GammaVega Curve Carry (Aged 1y)Time Carry (Aged 1y) Volatility Carry (Aged 1y) (Vol Carry)
Carry and BE Width are those obtained looking at the payoff profile at ex-piry. The Aged 1y Carry is produced by ageing the trade by one year (movingcloser to the expiration) and estimate the payoff profile computing the carry.Theta, Vega and Gamma are the sensitivities of the instruments by a change intime, volatility and a wider range of underlying rate movements, respectively.These and the ATMF Implied Vol are backed by the SABR model too. Curve,Time and Volatility Carry are the amount of Aged 1y Carry that can be at-tributed to the changes in certain sensitivities from spot to forward, such asthe Delta (Curve), Theta (Time) and Vega (Volatility). These can also be usedas tools to understand which factors most influence the instrument value overtime.After computing all these metrics at time t , we hold the trade until t + h where h can be two weeks, one month, one year, and so on, as long as t + h isbefore or at expiration. In time t + h we compute the P V t + h of the same trade6gain, using the new economic scenario available (e.g. rates, change in modelparameters). By agreeing on buying back or selling the current trade for P V t + h we can compute the Holding k-period Return of the trade started at time t by: R ( h ) t = P V t + h − P V t P V t (1)In summary, Table 3 presents an example of information in a wide formatthat is available when we combine the data from time t and t + h .Table 3: Example of information available at time t and t + h for the MCCS. Instant ( t ) P V t R ( h ) t R ( h ) t − h − Strike Features1 300 0.3 - 3.4 ...2 320 0.2 - 3.2 ...... ... ... ... ... ... t + h − t + h
260 -0.05 - 1.9 ... t + h + 1 270 -0.2 0.2 2.0 ...... ... ... ... ... ... T
250 - 0.1 2.2 ...
Note that in the most contemporaneous period (close to T ) we do not havethe P V T + h and so, we cannot compute R ( h ) T . Conversely, if we want to uselagged returns R ( k ) t − h − as explanatory forces for R ( h ) t , then in the beginningthis information is also not available. Therefore, our dataset is trimmed at thebeginning and the end mainly by the value of h . If h is small, such as twoweeks or one month, the trimming is imperceptible and, therefore, may notaffect the model fitting and validation. However, if h is large such as two orthree years, this might reduce the samples available substantially, decreasingthe range of models and cross-validation schemes that might be employed forthis task. Based on these procedures, metrics and observations, Table 4 expressother details that we used during our experiments to generate the dataset.Table 4: Details used to generate the MCCS trade dataset. Detail ValuePeriod September 2006 to September 2016Holding Period ( h ) 1 yearTrade Frequency Weekly (usually on Wednesday)Strike At the Money Forward (ATMF)Lagged data ( h − p ) p = 1 , . × V ega t Funding Rate Libor 3 month rate
Therefore, we gathered data from trades entered on a weekly basis fromSeptember 2006 to September 2016. These trades are struck ATMF, using the
P V t computed from the Middle Rate (in practice, some bid-ask spread wouldbe imbued proportional to the Vega). After holding for one year ( h = 1 y ) thetrade, we compute the arithmetical returns that are, therefore by definition,automatically annualised. These returns are gross, and so we need to takeinto account the transaction costs (hedging costs and fixed fees charged by the7erivatives desk) as well as some future funding rate. These values are alsooutlined in Table 4, where the transaction costs of 0.75 as a fraction of Vegawere chosen not only to taken into account the transaction cost, but also somepotential bid-ask spread on the start/unwind of the trade. The 3-month LondonInterbank Overnight Rate (LIBOR) was chosen as the funding cost/benchmarkrate to compute excess returns.Using these assumptions, next subsection presents the modelling strategythat taps into this dataset to create the recommendation system for the MCCStrades. In relation to modelling, our general model is a system of uncoupled equations: R (1 y ) t, = f ( f eatures t, ) + ε t, = ˆ R (1 y ) t, + ε t, (2) R (1 y ) t, = f ( f eatures t, ) + ε t, = ˆ R (1 y ) t, + ε t, (3) ...R (1 y ) t,n = f n ( f eatures t,n ) + ε t,n = ˆ R (1 y ) t,n + ε t,n (4)where for each MCCS trade ( i = 1 , ..., n ) there is an i-th predictive model f i that is feed with a set of pre-calculated features (BE Width, Carry, etc.) andreturns an estimate of the holding 1y-period return ˆ R (1 y ) t,i . As the model is anapproximation, some noise/error is expected, and in the modelling aspect, thisis expressed as the ε t,i component. After defining which variable is intended tobe predicted, the remaining points are: which models are available to embody f i and how the fitting, validation and selection of these models are going to bemade.About the first point, in the first rows of Table 5 we display the modelsthat we used during our experiments, with their mathematical descriptions andusage found in the following references [3, 11, 16, 20].8 a b l e : P a r a m e t e r s u s e d t o m o d e l t h e M CC S t r a d e d a t a s e t . A bb r e v i a t i o n M o d e l F i x e d H y p e r p a r a m e t e r s C r o ss - V a li d a t e d H y p e r p a r a m e t e r s C l a ss i c R e g r e ss i o n C l a ss i c a l L i n e a r R e g r e ss i o n N o n e N o n e B a c k S e l R e g r e ss i o nS t e p w i s e R e g r e ss i o n B a c k w a r dS e l ec t i o n N o n e R i d g e R e g r e ss i o n R i d g e R e g r e ss i o n N o n e λ = { , − , − , − , − } L a ss o R e g r e ss i o n L a ss o R e g r e ss i o n N o n e λ = { , − , − , − , − } K RR - R B F K e r n e l R i d g e R e g r e ss i o n R a d i a l - B a s i s F un c t i o n k e r n e l λ = { , − , − , − , − } a nd γ = { − , − , , , } k NN k - N e a r e s t N e i g hb o u r s E u c li d e a n D i s t a n ce k = { , , , } C A R TC l a ss i fi c a t i o n a nd R e g r e ss i o n T r ee M S E F un c t i o n M a x d e p t h = { , , , } R a nd o m F o r e s t R a nd o m F o r e s t N u m b e r o f tr ee s = nd M a x d e p t h = N o n e G r a d B oo s t R e g G r a d i e n t B oo s t i n g T r ee N u m b e r o f tr ee s = nd M a x d e p t h = L e a r n i n g R a t e : { . , . , . } M L P M u l t i - L a y e r P e r ce p tr o nS i n g l e h i dd e n l a y e r w i t hh y p e r b o li c λ = { , − , − , − , − } t a n g e n t a s tr a n s f e r f un c t i o n a nd nn = { , , } S V R - R B F Supp o rt V ec t o r R e g r e ss i o n R a d i a l - B a s i s F un c t i o n k e r n e l C = { , , , } a nd γ = { − , − , , , } A bb r e v i a t i o n B a s e li n e M o d e l P a r a m e t e r M e a n P r e d A v e r ag e P r e d i c t i o n N a i v e N a i v e M o d e l Z - S c o r e : B E - W i d t h B E W i d t h f e a t u r e R o lli n g w i nd o w o f s i ze = y e a r S t = (cid:98) − (( Z > ) ∗ Z ) / ( ) (cid:101) + Z - S c o r e : C a rr y A t E x p i r y C a rr y a t E x p i r y f e a t u r e R o lli n g w i nd o w o f s i ze = y e a r S t = (cid:98) − (( Z > ) ∗ Z ) / ( ) (cid:101) + O t h e r P a r a m e t e r s V a l u e s W a r m - up P e r i o d ( L ) L o u t e r = y e a r s L i nn e r = y e a r k - r o lli n g - c v k o u t e r = w ee k f o r o u t e r k i nn e r ≈ ( T t r a i n − L i nn e r ) / f o r i nn e r O u t li e r T r e a t m e n t W i n s o r i z i n g W i n s o r i z i n g Q u a n t il e s . nd . M i ss i n g D a t a T r e a t m e n t R e m o v e
9n Table 5 Model column presents a plethora of models that this work hasfitted for this prediction purpose: we started from simple predictive models suchas Classical Linear Regression, k-Nearest Neighbours and Classification and Re-gression Tree, towards those that can seamlessly exhibit nonlinear behaviours,like Random Forest, Kernel Ridge Regression, Multi-Layer Perceptron and Sup-port Vector Regression. Some of these methods had their hyperparameters heldconstant across all experiments (Fixed Hyperparameters column), or becausewe wanted to apply a particular form of a method (RBF kernel, single hiddenlayer, etc.) or because during a warm-up phase we noticed that they did notaffect substantially the results (hyperbolic tangent, increasing number of trees,etc.).For certain models, the Cross-Validated Parameters column shows which hy-perparameters were optimised before the prediction step. For instance, supposethe case of Ridge Regression and the need to define the regularisation value ( λ )appropriately. Consider that we have a set of training pairs ( f eatures t , R (1 y ) t ) Lt =1 of size L , and for this sample we subset it in k-rolling-cross-validation (k-rolling-cv) folders (better explained later in this subsection). Then, we train and testusing this scheme the Ridge Regression model with one of the predefined λ , say λ = 10 . We compute some performance function on the test set (Mean SquaredError – MSE) and repeat this process for all λ values available. We use in thefinal model the λ that on average had the lowest MSE.We fitted usual benchmarks found in the literature for regression and fore-casting modelling: the Average and Naive models [16,21]. We also implementedthe benchmarks that traders use to assess whether a particular MCCS is worthto be pitched or traded: BE Width and Carry at Expiry. We replicated theway traders look to these features, by computing z-scores based on averageand standard deviation on rolling window of size equal to 1 year. The signal forgoing long/short is done by a thumb rule with a simple rationale: if a certainmetric has a z-score above or equal to ±
3, the trader goes fully long (+)/short(-)in the trade, since it is a very extreme event. Otherwise, it reduces the leverageon it, until it below one standard deviation of distance from the rolling average.We removed any missing data, and clipped extremes values, mainly in re-turns above the 95% percentiles (in our case it can be due to some numericalproblems, or some extreme scenarios related to 2008-2009 financial crisis period).Next subsection presents the final component of our roadmap: recommendationsystem.
The recommendation of a certain trade can be made solely on some normalisedversion of the expected return for holding 1y-period the i-th trade ( ˆ R (1 y ) t,i ). Giventhat each model will be providing individual forecasts for each MCCS and afterthat their performance will be assessed locally and globally, a more suitablemanner to proceed would be to assign a credit based on the tracking record ofa model to predict a particular MCCS trade. Hence, we will be weighted up ordown a signal not only based on the magnitude of a model prediction but alsoby its quality. Then, consider as ˆ R (1 y ) t,i the expected return for holding 1y-period a z-score is defined by: Z − score = X − µσ where X represent the actual value of a certainvariable, µ and σ the average and standard deviation of X in a period. S t,i by: S t,i = ˆ R (1 y ) t,i × Rho ˆ R (1 y ) t,i ,R (1 y ) t,i max( | ˆ R (1 y ) t,i × Rho ˆ R (1 y ) t,i ,R (1 y ) t,i | , ..., | ˆ R (1 y ) t − h,i × Rho ˆ R (1 y ) t − h,i ,R (1 y ) t − h,i | , where the strength of the i-th long/short signal is given by its expectedreturn, scaled by the maximum weighted return that a long/short position onthe same trade (that is why the returns are in absolute terms) was expected toyield in the previous h-period (in this case 1 year). Therefore, the trade with themaximum weighted return in absolute terms will have | S t,i | = 1 as well as thoseclose to zero will yield S t,i ≈
0. The weight/credit of a certain prediction is basedon the historical Pearson correlation coefficient, that is, adherence between theactual and predicted values.
Below we outline two types of metrics: one that focuses on the predictiveperformance that the model provided, and other three that are based on theprofit/loss that its application harvested during the backtest. Set by R ( S ) t = R (1 y ) t × S t ( ˆ R (1 y ) t ) the strategy return (combination of the realized/observed ex-cess returns and the signal – function of a model prediction) , we can computethe following metrics: • Pearson Correlation Coefficient (Rho): it is a dimensionless measure ofthe linear dependence between the actual and predicted values:
Rho = Cov [ R (1 y ) t , ˆ R (1 y ) t ] (cid:113) V ar [ R (1 y ) t ] V ar [ ˆ R (1 y ) t ] (5)where Cov and
V ar are the covariance and variance operators. It rangesfrom [ − , +1], with − Rho is close to +1.In the context of linear models, a higher predictive power is a necessarycondition for profitable trades (see [1]), hence by minimising the predictiveerror we are somewhat trailing a path for profits maximisation, albeit suchcausation is not very clear since this is not a sufficient condition. • Average Return (Avg Return): is the arithmetic average of the strategyreturns: ¯ R ( S ) = (cid:80) Tt =1 R ( S ) t T (6) • Standard Deviation: is the estimator of the dispersion around the strategyaverage returns (a risk measure in certain sense): σ R ( S ) = (cid:115) (cid:80) Tt =1 ( R ( S ) t − ¯ R ( S ) ) T (7) For the sake of brevity we dropped the subscript that refers to a particular trade ( i ). Information Ratio: is the average annualized return of a strategy earnedin excess of a particular benchmark per unit of risk (measured in terms ofstandard deviation): IR = ¯ R ( S ) − ¯ Bσ R ( S ) (8)where ¯ B is the average return of the benchmark (e.g., treasury bond,equity index). In our case, it was already set to the 3-month LIBOR rate(Table 4). It should be mentioned that Information Ratio makes eachstrategy performance comparable: since we are adjusting average returnsby the risk assumed for each strategy, it removes the leverage componentthat is magnifying/shrinking the returns provided by a certain strategy. Figure 4 displays a heatmap with the results of all models for each trade re-garding Average Return (%). Similarly, different remarks can be made overthe global picture: (i) the Naive and Mean Pred models underperformed, butthe traders benchmarks did perform reasonably well, surpassing the predictivemodels in many occasions; (ii) from the linear regression family, Lasso Regres-sion followed by Ridge Regression are the ones that performed better; (iii) mostnonlinear models failed to provide a decent average return; and (iv) MLP faredwell for the trades in the EUR Xy1y1y range, but did not repeat a more stableacross other trades.When we take into account the variability seen in the stream of returnsgenerated by the recommendation system, we may encounter a different pic-ture. Figure 5 shows a heatmap with the Information Ratio for all the availablecombinations of models and trades.In general, the models kept their positions unaltered in comparison to theAverage Return (%) – linear models still fared better than the nonlinear ones–, but now all are standing on a similar scale. Based on these InformationRatio results, Table 6 presents a statistical analysis using the average ranks ,Friedman test and Holm posthoc procedure [10].When we look at the average rank, Lasso Regression was the top positioned(3.23) while Mean Pred remained most of the time as the worst choice (12.86).The trader’s benchmarks performed pretty well, being placed in the third andfourth places. When we compare whether such result fared by Lasso Regres-sion was substantially different from Ridge Regression (4.86), we arrive with aZ-score equal to 1.63 and a p-value of 0.0517. If we set our initial significancelevel as 0.05 and correct using the Holm procedure (last column) we can assertthat Lasso did not perform significantly different from Ridge Regression, butway better than the other models. Therefore, Lasso Regression is capturingsome information beyond that is being spanned by the trader’s benchmarks, aswell as beating almost all other predictive models for this particular task. Our When we rank the models for a single MCCS, it means that we sort all them in suchway that the best performer is in the first place (receive value equal to 1), the second best ispositioned in the second rank (receive value equal to 2), and so on. We can repeat this processfor all trades and compute metrics, like the average rank (e.g., 1.35 means that a particularmodel was placed mostly near to the first place).
Model Avg Rank Z-score p-value Holm CorrectionMean Pred 12.86 9.63 < < < < < < < < < < - - -Friedman Chi-Square 117.22 < positive/negative regardless of its magnitude. In respect to the top image, wecan see that Lasso Regression started well during the first year but suffered adrawdown in the second to third year. This period was marked by higher volatil-ity, mainly due to the final developments of the Euro Crisis period (2010-2012).However, from the third year onwards the average returns always scored positivevalues, usually ranging from 10% to 20% in average. Such performance can beseen stamped on the middle left histogram, where the bulk of returns lies abovezero, and not only that but concentrated close to 15%. This performance waslargely generated by Lasso Regression suggesting short positions (middle right),while the long positions were not so successful. Such pattern can be better seenin the histogram located at the bottom, where a bimodal distribution for tradingrecommendation success rate is depicted. Probably the verified outperformancecoming from taking short positions in the MCCS is linked with betting againstthe volatility/variance risk-premium trade [6]. Roughly this strategy harvestthe premium paid by a counterpart for the insurance on large swings in themarket (almost the same as selling a put for equities options). Since in general,the market tends to remain range-bounded, the investor shorting the trade canrepurchase it later for a smaller premium, profiting from this differential. LassoRegression did dynamically the opposite and profited from it, largely becausein this last 5-6 years was populated of higher volatility periods and tail events.Figure 8 help us to analyse which features are being most significant byLasso Regression for each particular trade.Each cell corresponds to a normalised t-stats from the model coefficientsbuilt in the last step from k-rolling-cv. Implied Vol was the most significantfeature pointed out by Lasso Regression, is negatively related with the MCCSreturns. Other important features were the BE Width – slightly positivelycorrelated with returns – and the Carry at Expiry – negatively related, butprobably due to the depressed levels of carry that has been seen in the lastbatch of data. Lasso Regression promoted in general very sparse models, being By normalised t-stats we mean dividing each coefficient t-stat by the sum of the absolutevalues of all t-stats in the model. The result is a number between -1 and +1, indicating thesignificant magnitude in comparison to other variables, as well as the direction in which itaffects the model predictions. We multiplied it by a one hundred just to work on a moreconvenient scale of -100% and 100%.
Conclusions
This work proposed a trading recommendation system for Mid-Curve CalendarSpread Trades (MCCS). We proposed a recommendation system that couldanalyse and rank a set of fixed income derivatives trades. Our first experimentis designing and applying this method for Mid-Curve Calendar Spread trades.Therefore, we started the methodology by showing the dataset: it comprisedof 35 MCCS trades, ranging from September 2006 to September 2016, withdifferent expirations, forward and swap tenures. For each particular trade, wedescribed how the sampling of inputs (metrics, sensitivities and lagged returns)and outputs (returns from unwinding the trade after one year of its start) werecomputed on a weekly basis. Then, we displayed the modelling strategy byhighlighting the models that were trained as well as which hyperparameterswere investigated during the nested resampling step. Before entering the resultssection, we presented the backtesting setting with the performance measuresused to compare different methodologies.Most models provided results better than the modelling benchmarks (Meanand Naive), yet very few were able to outperform the trader’s benchmarks. Ourresults suggested that linear models with shrinkage procedures (e.g., Ridge andLasso) tended to perform better than their nonlinear counterparts (like KernelRidge Regression, SVR and MLP). Also, regarding interpretability, they tend tobe easier to convey to the traders, since most are versed in linear models. Whenwe delved into Lasso Regression results, we found out that this model wieldedsome interesting features like: (i) it learned a type of volatility buying/sellingstrategy without being programmed to do so; (ii) its returns distribution acrossall MCCS tended to be right-skewed, meaning that we are more hedged towardsdangerous scenarios with greater chances of upsides; (iii) it matched tradersview on selecting good trades, but adding some dynamic view on it since Carryat Expiry is now negatively linked with returns, rather than the original viewfrom the traders. We believe that Lasso Regression will be our choice for a firstversion of the trading recommendation system, with future developments givingspace to different models and mixed approaches.
Acknowledgment
Adriano Soares Koshiyama wants to acknowledge the funding for its PhD stud-ies provided by the Brazilian Research Council (CNPq) through the ScienceWithout Borders program. Also, the authors would like to thanks, GuillaumeAndrieux, Tomoya Horiuchi, Gerald Rushton, Tam Rajendran, and AnthonyMorris for all the comments and support during this research.
References [1] Emmanuel Acar and Stephen Satchell.
Advanced trading rules .Butterworth-Heinemann, 2002.[2] Rebecca M. Baker, Tahani Coolen-Maturi, and Frank P. A. Coolen. Non-parametric predictive inference for stock returns.
Journal of Applied Statis-tics , 44(8):1333–1349, 2017. 203] C Bishop.
Pattern Recognition and Machine Learning . Springer, New York,2007.[4] Damiano Brigo and Fabio Mercurio.
Interest rate models-theory and prac-tice: with smile, inflation and credit . Springer Science & Business Media,2007.[5] John Y Campbell, Andrew Wen-Chuan Lo, and Archie Craig MacKinlay.
The econometrics of financial markets . Princeton University press, 1997.[6] Hoyong Choi, Philippe Mueller, and Andrea Vedolin. Bond variance riskpremiums.
Review of Finance , 21(3):987–1022, 2017.[7] Eunsuk Chong, Chulwoo Han, and Frank C. Park. Deep learning networksfor stock market analysis and prediction: Methodology, data representa-tions, and case studies.
Expert Systems with Applications , 83:187 – 205,2017.[8] Howard Corb.
Interest Rate Swaps and Other Derivatives . Columbia Uni-versity Press, 2012.[9] Y. Deng, F. Bao, Y. Kong, Z. Ren, and Q. Dai. Deep direct reinforcementlearning for financial signal representation and trading.
IEEE Transactionson Neural Networks and Learning Systems , 28(3):653–664, 2017.[10] Joaqu´ın Derrac, Salvador Garc´ıa, Daniel Molina, and Francisco Her-rera. A practical tutorial on the use of nonparametric statistical testsas a methodology for comparing evolutionary and swarm intelligence algo-rithms.
Swarm and Evolutionary Computation , 1(1):3–18, 2011.[11] Richard O. Duda, Peter E. Hart, and David G. Stork.
Pattern Classification(2nd Edition) . Wiley-Interscience, 2000.[12] Graham Elliott, Antonio Gargano, and Allan Timmermann. Completesubset regressions.
Journal of Econometrics , 177(2):357 – 373, 2013.[13] Nick Firoozye and Qilong Zhang. Turbo carry zooms ahead: A performanceupdate of the turbo-carry trades.
Nomura International plc, Nomura Re-search , 2014.[14] Nick Firoozye and Qilong Zhang. Usd short-term front-end turbo carryusd 1m1y2y trades: Short horizon for sizeable carry.
Nomura Internationalplc, Nomura Research , 2014.[15] Nick Firoozye and Xiaowei Zheng. Market update: Forward vol and mid-curve calendar spreads in usd and eur recent levels and carry and trades ofnote.
Nomura International plc, Nomura Research , 2016.[16] Jerome Friedman, Trevor Hastie, and Robert Tibshirani.
The elements ofstatistical learning , volume 1. Springer series in statistics Springer, Berlin,2001.[17] R. Gencay and Min Qi. Pricing and hedging derivative securities with neu-ral networks: Bayesian regularization, early stopping, and bagging.
IEEETransactions on Neural Networks , 12(4):726–734, 2001.2118] Eduardo A. Gerlein, Martin McGinnity, Ammar Belatreche, and SonyaColeman. Evaluating machine learning classification for financial trading:An empirical approach.
Expert Systems with Applications , 54:193 – 207,2016.[19] Gyu-Sik Han and Jaewook Lee. Prediction of pricing and hedging errorsfor equity linked warrants with gaussian process models.
Expert Systemswith Applications , 35(12):515 – 523, 2008.[20] Simon S Haykin.
Neural networks and learning machines , volume 3. Pear-son, 2009.[21] Rob Hyndman, Anne B Koehler, J Keith Ord, and Ralph D Snyder.
Fore-casting with exponential smoothing: the state space approach . SpringerScience & Business Media, 2008.[22] Andreas Karathanasopoulos, Konstantinos Athanasios Theofilatos, Geor-gios Sermpinis, Christian Dunis, Sovan Mitra, and Charalampos Stasi-nakis. Stock market prediction using evolutionary support vector ma-chines: an application to the ase20 index.
The European Journal of Fi-nance , 22(12):1145–1163, 2016.[23] Christopher Krauss, Xuan Anh Do, and Nicolas Huck. Deep neural net-works, gradient-boosted trees, random forests: Statistical arbitrage on thes&p 500.
European Journal of Operational Research , 259(2):689 – 702,2017.[24] Burton G Malkiel. The efficient market hypothesis and its critics.
TheJournal of Economic Perspectives , 17(1):59–82, 2003.[25] Masafumi Nakano, Akihiko Takahashi, and Soichiro Takahashi. General-ized exponential moving average (ema) model with particle filtering andanomaly detection.
Expert Systems with Applications , 73:187 – 200, 2017.[26] Sheldon Natenberg.
Option volatility and pricing: advanced trading strate-gies and techniques . McGraw Hill Professional, 2014.[27] Hyejin Park, Namhyoung Kim, and Jaewook Lee. Parametric models andnon-parametric machine learning models for predicting option prices: Em-pirical comparison study over { KOSPI }
200 index options.
Expert Systemswith Applications , 41(11):5227 – 5237, 2014.[28] Hyejin Park and Jaewook Lee. Forecasting nonnegative option price distri-butions using bayesian kernel methods.
Expert Systems with Applications ,39(18):13243 – 13252, 2012.[29] Riccardo Rebonato, Kenneth McKay, and Richard White.
TheSABR/LIBOR Market Model: Pricing, calibration and hedging for com-plex interest-rate derivatives . John Wiley & Sons, 2011.[30] J. Beleza Sousa, M. L. Esquvel, and R. M. Gaspar. Machine learning vasicekmodel calibration with gaussian processes.
Communications in Statistics -Simulation and Computation , 41(6):776–786, 2012.2231] Christian von Spreckelsen, Hans-Jrg von Mettenheim, and Michael H. Bre-itner. Real-time pricing and hedging of options on currency futures withartificial neural networks.
Journal of Forecasting , 33(6):419–432, 2014.[32] Tianle Zhou, Shangce Gao, Jiahai Wang, Chaoyi Chu, Yuki Todo, andZheng Tang. Financial time series prediction using a dendritic neuronmodel.
Knowledge-Based Systems , 105:214 – 224, 2016.[33] Xiaocong Zhou, Jouchi Nakajima, and Mike West. Bayesian forecastingand portfolio decisions using dynamic dependent sparse factor models.