[PDF] Data-Driven Option Pricing using Single and Multi-Asset Supervised Learning

Abstract

We propose three different data-driven approaches for pricing European-style call options using supervised machine-learning algorithms. These approaches yield models that give a range of fair prices instead of a single price point. The performance of the models are tested on two stock market indices: NIFTY 50 and BANKNIFTY from the Indian equity market. Although neither historical nor implied volatility is used as an input, the results show that the trained models have been able to capture the option pricing mechanism better than or similar to the Black-Scholes formula for all the experiments. Our choice of scale free I/O allows us to train models using combined data of multiple different assets from a financial market. This not only allows the models to achieve far better generalization and predictive capability, but also solves the problem of paucity of data, the primary limitation of using machine learning techniques. We also illustrate the performance of the trained models in the period leading up to the 2020 Stock Market Crash (Jan 2019 to April 2020).

Full PDF

DDATA-DRIVEN OPTION PRICING USING SINGLE AND MULTI-ASSETSUPERVISED LEARNING

ANINDYA GOSWAMI*, SHARAN RAJANI, AND ATHARVA TANKSALE

Abstract.

We propose three diﬀerent data-driven approaches for pricing European-style call options usingsupervised machine-learning algorithms. The proposed approaches are tested on two stock market indices:NIFTY50 and BANKNIFTY from the Indian equity market. Although neither historical nor implied volatil-ity is used as an input, the results show that the trained models have been able to capture the option pricingmechanism better than or similar to the BlackScholes formula for all the experiments. Our choice of scalefree I/O allows us to train models using combined data of multiple diﬀerent assets from a ﬁnancial mar-ket. This not only allows the models to achieve far better generalization and predictive capability, but alsosolves the problem of paucity of data, the primary limitation of using machine learning techniques. We alsoillustrate the performance of the trained models in the period leading up to the 2020 Stock Market Crash(Jan 2019 to April 2020). Introduction

Fair pricing of ﬁnancial instruments is at the heart of market stability. Mispricing securities may causetraders to incur massive losses and can also indirectly aﬀect the ﬁnancial health of a market. It is thus vitalto be able to derive the fair price of tradable ﬁnancial instruments. The seminal paper [3] laid the foundationof the theory of no arbitrage option pricing, following which the scope of the theory has been extended byseveral authors. However, the fair price of an option contract depends on the current anticipation of thefuture dynamics of the underlying asset. This is why the authors of [15] argued that the success or failure oftheoretical option pricing and hedging is closely tied to the success in capturing the dynamics of the under-lying assets price movements. Since this is a hard problem, adoption of data-driven approaches in pricingoption contracts is gaining attention with the advent of superior computational power and advancementsin statistical learning techniques. In this manuscript, we propose data-driven approaches for prescribingthe fair price of an option contract without assuming any particular theoretical law of the underlying assetdynamics. We also propose and illustrate the use of data drawn from multiple assets/sources to train thesedata-driven option pricing models. This allows us a way to mitigate the possible paucity of data available totrain models. We would like to emphasize that the work presented in this study does not attempt to emulatethe Black-Scholes formula or any other theoretical option pricing model.In the past, several authors have investigated the possibility of building a data-driven option pricing model;We give a brief overview of the literature that exists. In [20], the authors conveyed their belief that thetrading process of option contracts itself may reveal analytical models. The data-driven investigations in[15] and [20] were based on option contracts on the S&P 500. While the former used only the moneynessparameter (ratio of spot and strike values) and time-to-maturity as inputs to their learning model, the latteralso used historical volatility, interest rate, and lagged prices of the underlying asset and option contract.The authors of [18] obtained a better prediction performance than [15] by including the open interest inaddition to all the non-lagged inputs of [20]. On the other hand, in [19], S&P 100 data was used to predictthe implied volatility instead of the option price, using past volatilities and option-contract parameters. In[17], a variant of implied volatility was used as an input to predict the deviation of the actual market pricefrom the Black-Scholes price of the option contract. The model performance was illustrated on AO SPI Indexoptions. If the log returns of the underlying asset is independent of the stock price level, the formula for fairprice of an option is homogeneous of degree one in both spot and strike. The authors of [11] implementedthis relation in the structure of the neural network and built a model using option contract data of the S&P * Corresponding author.This research was supported in part by the SERB MATRICS (MTR/2017/000543), DST FIST (SR/FST/MSI-105). a r X i v : . [ q -f i n . S T ] A ug ANINDYA GOSWAMI*, SHARAN RAJANI, AND ATHARVA TANKSALE

500 Index. The authors of [5] discuss how a technique named proﬁling could be used to select the optimalneural network structure to predict the implied volatility. This technique was illustrated on USD/NEMexchange rate options and the model took various contract parameters as inputs. The authors of [22] arguedthat option contract data should be partitioned according to moneyness in order to improve the accuracy inpricing options and they illustrated this performance improvement using Nikkei 225 Index option contracts.In [13] the authors exhibited the eﬀectiveness of cross validation, Bayesian regularization, early stoppingand bagging in preventing overﬁtting and improving generalization, in the process of pricing S&P 500 calloptions using an artiﬁcial neural network (ANN). The author of [1] attempted to predict the bid-ask spreadof options on the OMX Stockholm 30 Index, using multiple lagged asset prices and their sample standarddeviations. In [2], the authors used the dividend rate in addition to Black-Scholes-based features to price op-tions contracts on the FTSE 100 Index; the model performance was compared with the Black-Scholes-Mertonprice that incorporates dividends. The authors of [12] used S&P 500 option contract data and developed a“modular” ANN model for option price prediction. In particular, they divided the data set into 9 disjointparts or modules, according to the moneyness and the time to maturity parameters of the contracts. Asimilar modularity is adopted in [7] where the authors build a hybrid model using BANKNIFTY optioncontracts. Some of the previously mentioned papers have prescribed data-driven option hedging strategies,while some others have also demonstrated success in predicting the price of exotic options using their modeloutputs. The above survey is not meant to be exhaustive but conveys the broadly accepted methodologies fordeveloping supervised learning models to price options. This manuscript borrows aspects like homogeneityhint and modularity from the existing literature.In this manuscript, we propose three diﬀerent approaches to generate feature sets from the market data,each of which yields 17 −

22 features. Each feature set is then used to train two modelsusing an ANN and theXGBoost algorithm respectively. None of the approaches include measures of volatility as features. However,we assume that the statistical distribution of the underlying assets’ returns is independent of the level of thestock price ( s ). This implies that the option price function is homogeneous of degree one in both, the spotprice ( S ) and the strike price ( K ). In view of this, we construct feature sets using the underlying asset’s logreturns, moneyness ( SK ), and time to maturity. Furthermore, the output variable has been constructed usingthe ratio ( CK × C ) to the strike price ( K ). The fair price of an option contract mustdepend on the anticipated statistical distribution of the future price of the underlying asset. We try to in-corporate this principle using a non-parametric approach, wherein we consider a ﬁxed number of consecutiveOrder Statistics of log returns of the daily underlying close prices as features. We compare the performanceof this approach with another approach, wherein the feature set consists of only the ﬁrst two moments of thelog returns of the underlying asset’s daily Open-High-Low-Close prices. Both the approaches appear to beequally eﬀective. Finally we compare these two approaches with a third approach, that augments featuresfrom the second approach by including a few additional features derived from the historical option pricedata. This particular approach outperforms the previous two as the option price data contains signiﬁcantadditional information relevant to the present day option price. To the best of our knowledge, option pricingmodels using these feature sets have not been reported in the literature so far.In the proposed data-driven approaches, disjoint consecutive intervals of the option contract price is set asthe output instead of a single predicted option price, as we believe that no real market is complete. In otherwords, a random payoﬀ such as an option contract may have multiple fair prices, and a single predictedprice is more confusing than convincing. Hence we deﬁne the output variable in a manner that conveys therange of fair prices. We measure and compare the performance of the models described in the manuscriptusing two diﬀerent error metrics. The ﬁrst proposed error metric attempts to mimic the mean absolute error(MAE) while the second metric gives the inaccuracy in predicting the option price to lie within a certainneighborhood of the actual option price. We also compare the performance of the proposed models with thetheoretical Black-Scholes option pricing model. It is observed that the models constructed using the thirdapproach outperform the Black-Scholes pricing formula in terms of the above mentioned metrics whereasother proposed models perform equivalently, if not better. Again, we would like to emphasize that neitherhistorical nor implied volatility is used as an input in any of the proposed models. ATA-DRIVEN OPTION PRICING USING SINGLE AND MULTI-ASSET SUPERVISED LEARNING 3

We would also like to emphasize on the fact that none of the features were selected based on importanceanalysis, as the process of determining feature importance essentially depends on the particular choice of thetraining data used. Despite maintaining such indiﬀerence, the success in predicting option prices indicatesthat perhaps these data-driven models are capable of learning certain universal rules of option pricing. Wealso ensure that the inputs and outputs of the models are scale-free, which allows us to investigate if modelscould be trained on option contract data from two diﬀerent assets/sources. This, in principle, would allow usto construct models that can capture the option pricing mechanism for a broader range of underlying assetdynamics. Our experiments show that the models trained using data from multiple assets/sources possesssuperior option pricing capabilities than the models trained on individual assets/sources. These experimentshave been performed using NIFTY50 and BANKNIFTY option price data. However, since we have not ex-perimented with a suﬃciently broad class of assets, the complete scope and the limitations of this technique(referred to as combined training) is still unclear. Nevertheless, we propose a methodology to gain a deeperunderstanding of the combined training eﬀect than what the error metrics oﬀer. In this method, for a trainedmodel, we perform a family of tests using simulated Black-Scholes option price data with varying volatility.Results show that the simple idea of combined training produces models that predict the option price for awide range of underlying asset price dynamics fairly well. In other words we observe domain adaptabilityfor a wide variety of simulation data, clearly indicating the eﬀectiveness of the combined training technique.Drawing from the modularity approach proposed by [22], [12] and [7], we choose to train our models on aparticular subset of the contract data. To elaborate, we perform our experiments on a “ﬁltered” datasetcomprising of only near-ATM (at-the-money) contracts. The “ﬁltered” dataset also excludes option contractsthat have either too short or too long time-to-maturity values. We believe that including a full range ofmodularity, as in [22], [12], and [7], would complicate the exposition of this paper with too many experiments,as we study six diﬀerent models constructed using three approaches and two algorithms, on two diﬀerentassets/sources.This paper is organized in eight sections. The second section brieﬂy presents the basics of supervised learning,and explains the two supervised learning algorithms used to construct the models. Section 3 contains detailsabout the data under consideration. The input and output of the learning models are explained in Section 4.In Section 5 we report the performance of the trained models. An analysis of the combined-trained models’performance is presented in Section 6. Performance of the models on 2019-2020 data is given in Section 7.Finally we comment on future research directions in the last section.2.

Supervised Learning Algorithms

Attempts to develop algorithms that are capable of performing a task without explicitly specifying the ex-pected outcome have led to the development of the ﬁeld of Machine Learning. This manuscripts leverages aspeciﬁc subset of machine learning algorithms, known as supervised learning algorithms. These algorithmstake in labelled data as input and “learn” the task at hand. The term “learn” implies that the algorithmsconstruct abstract representations of the data with the aim of capturing patterns that are fundamental to thetask at hand. In the following subsections, we describe brieﬂy two supervised learning algorithms, namelyExtreme Gradient Boosting (XGBoost) and Artiﬁcial Neural Network (ANN). These algorithms are usedin the later sections of this manuscript. Before studying the speciﬁcs of the algorithms, it is instructive tounderstand the general premise of supervised learning algorithms.Consider a ﬁnite labelled dataset represented as { ( X , Y ) , ( X , Y ) , ( X , Y ) , . . . , ( X J , Y J ) } , where the vector X j is associated with a label Y j . The algorithms attempt to ﬁnd a mapping f : X j (cid:55)→ Y j such that themapping obtained is the “best” out of all the possible mappings. A qualitative assessment of the mapping(also referred to as a model) is made possible by an “objective” function (also known as a “loss function”).The speciﬁcs of the objective function and the strategy used to create the mappings vary with the choice ofthe algorithm.2.1. Extreme Gradient Boosting.

Developed by Tianqi Chen in 2016 (refer [6]), Extreme GradientBoosting combines two powerful techniques, namely “boosting” and “gradient descent”. It builds upon thegradient boosting decision tree algorithms developed by Friedman in 2001 (refer [9]) and 2002 (refer [10]).

ANINDYA GOSWAMI*, SHARAN RAJANI, AND ATHARVA TANKSALE

Gradient boosting involves constructing an ensemble of “weak” learners, which in the case of XGBoost, aredecision trees. These “weak” learners are combined in an iterative fashion to obtain a “strong” learner.A “weak” learner is a model whose accuracy of predictions is slightly better than a model making randompredictions. Refer to [8] for more details on how “weak” learners can be combined to create “strong” learners.A typical classiﬁcation task involves categorizing an input to its label (or class). Successfully performingclassiﬁcation requires the model to determine a close approximate of the true conditional probabilities of theclasses, given an input. The XGBoost algorithm, for a set of N output classes, assigns a score F i ( x ) to the i th class for the input x . We deﬁne F ( x ) as ( F ( x ) , F ( x ) , F ( x ) , . . . F N ( x )). The scores obtained are thenused to calculate the probability of each class to be the predicted class by using the softmax function, P ( x )deﬁned as P i ( x ) := e F i ( x ) (cid:80) Nk =1 e F k ( x ) , i = 1 , , . . . , N. (1)The XGBoost algorithm then computes the “objective” (or loss) function value for each input x by deter-mining how far away from the true distribution is the distribution of the predicted values. This is done byusing Categorical Cross Entropy (CE), a loss function, which is deﬁned as L ( z, F ( x )) := − N (cid:88) i =1 z i log( P i ( x )) (2)where z := ( z , z , . . . z N ) is a given p.m.f of the true outputs. The XGBoost algorithm seeks to minimizethe value of this loss function over all possible F ( x ) based on the training set of J input-output pairs, { ( x j , y j ) | j = 1 , , , . . . J } . These pairs are used to compute the value of z ( j ) , for each j , such that z ( j ) i = δ ( y j , i ), where δ is the Kronecker Delta function. XGBoost then uses the gradient boosting algorithmto obtain an approximate of the minimizer (cid:98) F ( · ) := argmin F ( . ) 1 J (cid:80) Jj =1 L ( z ( j ) , F ( x j )) . In order to ﬁnd theminimizer, the method of steepest descent is applied. The algorithm ﬁrst computes F (0) ( · ) := argmin γ ∈ R N J (cid:88) j =1 L ( z ( j ) , γ ) . This iterative scheme is then carried out over m = 1 , , , . . . M iterations. At m th iteration the algorithmcomputes for each j , the residual r ( m ) j := − ∂L ( z ( j ) , γ ) ∂γ (cid:12)(cid:12)(cid:12)(cid:12) γ = F ( m − ( x j ) ∈ R N . The weak learner h ( m ) ( x ), is then ﬁt to the training dataset { ( x j , r ( m ) j ) } Jj =1 . The algorithm then computesthe multiplier α ( m ) using the equation α ( m ) = argmin α ∈ R J (cid:88) j =1 L ( z ( j ) , F ( m − ( x j ) + αh ( m ) ( x j )) . This multiplier, α ( m ) , is then used to update the model/score as given by the scheme F ( m ) ( · ) = F ( m − ( · ) + α ( m ) h ( m ) ( · ) . The XGBoost algorithm thus results in a strong learner by combining M weak learners in order to obtaina close approximate to the true probability distribution. The reader is encouraged to consult the referencescited as this exposition is not meant to be comprehensive.2.2. Artiﬁcial Neural Network.

Developments in the ﬁeld of machine learning led to the advent of al-gorithms that sought to mimic biological neural networks. These algorithms (referred to as ANN) attemptto harness the ability of biological networks to learn patterns within data. This manuscript presents a briefoverview of a special type of ANN known as Feed Forward neural network . We use Feed Forward neuralnetworks for the experiments proposed in the later sections to classify structured data inputs. The reader The manuscript uses the term ANN to refer to Feed Forward neural networks in the later sections

ATA-DRIVEN OPTION PRICING USING SINGLE AND MULTI-ASSET SUPERVISED LEARNING 5

Figure 1.

A representative Feed Forward Neural Netmay refer to [14] for a comprehensive study of ANNs. Figure 1 depicts a general Feed Forward neural network (this ﬁgure is representative. Refer Table 1 fordetails). A neural network is a set of “neurons” that interact with each other to “learn” the representationspace of the input data. Figure 2 shows the structure of a neuron.

Figure 2.

Components of a single neuronAs can be seen in Figure 2, the output η of a neuron can be given by- η = f (cid:18) n (cid:88) i =1 w i ψ i + b (cid:19) where ψ = ( ψ , ψ , . . . , ψ n ) are the inputs to the neuron, w i is the weight associated with each input ψ i and b is the overall bias associated with the neuron; the function f is called the activation function and is used toimpart non-linearity to the neural network. As evident from Figure 1, a feed forward neural network consistsof a number of “layers” of stacked neurons. Each neuron in a layer is connected to every neuron in the nextlayer. Thus the outputs of the neurons in the preceding layer act as the inputs to the neurons in the nextlayer. As stated earlier, each “connection” between any pair of neurons, has a weight w associated with it.The optimal number of layers in a neural network and the number of neurons in each layer is to be uniquelydetermined for a given problem, and is referred to as the architecture of the neural network. Along withthis, it is also necessary to determine the appropriate activation functions for each of the neurons as well asthe optimization scheme to be used. The architecture of the ANN used in the present study is given in Table 1. Composition of the ANN used

Number of Neurons Activation FunctionLayer 1

128 ReLU

Layer 2

64 ReLU

Layer 3

50 softmax

Table 1. “Architecture” of the Neural Net used Figure taken from http://neuralnetworksanddeeplearning.com/chap1.html

ANINDYA GOSWAMI*, SHARAN RAJANI, AND ATHARVA TANKSALE

The activation function used for each layer has been indicated in Table 1. The ReLU activation function isdeﬁned as - ReLU :: f ( x ) = max(0 , x )The softmax function, as explained previously (refer Equation (1)), gives the class probabilities. We usethe loss function, categorical crossentropy (refer Equation (2)) to determine how far the true probabilitydistribution is from the distribution of the predicted values. In order to “learn” a given task, the sequenceof weights that serve as a minimizer to the loss function are to be found, as this corresponds to a higherprediction accuracy by the neural network. This is achieved by optimizing the weights using an optimizationscheme (commonly known as training the network). In the present study, we use the Adam optimiser, anadvancement of the stochastic gradient descent optimizer (refer [16]).3. Data

We aim to model the pricing mechanism of option contracts that are traded in a ﬁnancial market. NSE,an Indian stock exchange, facilitates the trading of option derivatives on stocks and stock indices in highvolumes. Markets with high trading volumes generally imply a high level of trader participation, whichfurther implies a lower chance of the market being imperfect (i.e, the market is eﬃcient). This also allowsus to consider the traded price of the derivative as the “fair” price. Persistent high trading volumes fora particular range of option contracts give us a better chance to “learn” the pricing mechanism of thoseoption contracts. Some of the NSE based stock indices that have a high option contract trade volumesare the NIFTY50 and BANKNIFTY. For our experimentation, we extract the daily contract price data ofcall options for both, NIFTY50 and BANKNIFTY. Data is extracted for the years 2015 − . It is then ensured that the data set obtained Figure 3.

Snapshot of the Unﬁltered option Datasetis purged of contracts that are not traded. For reasons related to the construction of the models, we add anew column to the ﬁltered dataset that records the close price of the same option on the previous day. If theoption contract did not exist on the previous day, we report the value 0 in this new column. We subsequentlyscreen the data to remove all rows that have a zero in the new column. We then add more columns to thedata array to include the “Open”, “High”, “Low” and “Close” prices of the underlying asset for the past20 days corresponding to each row. Further more, we add an additional column that represents the threemonths’ government bond yield (see Section 4.2).We then select the option contracts that are in the vicinity of at-the-money(ATM) contracts. To be moreprecise, we only select those contracts for which the quantity | − SK | is not more than the pre-decided value of0 .

04, where K and S are the strike and the spot prices respectively. We refer to such contracts as near-ATMoption contracts. It has been observed that numerous near-ATM option contracts are traded everyday withidentical or diﬀerent time to maturities. However, signiﬁcantly low trading volume is observed for contractswith very large or very small time to maturities. Hence we choose to study, only those contracts whose The data can be accessed using the link -

ATA-DRIVEN OPTION PRICING USING SINGLE AND MULTI-ASSET SUPERVISED LEARNING 7 time-to-maturity values are not more than 45 days and not less than 3 days.

Dataset

NIFTY50 BANKNIFTY

Raw

Filtered

Train

Test

Table 2.

Train/Test Split: Dataset Sizes

Train/Test Split

Figure 3 is an indicative sample of the NIFTY50 option contract dataset that we obtainfrom the NSE. In order to build a predictive model using the algorithms described in Section 2, the datasetneeds to be split into separate datasets that would be used to train and evaluate the trained models. Mostsupervised learning algorithms when trained with time series data, necessitate splitting the dataset linearlyas the individual observations are not independent. In the same vein, we split the dataset in two partsaccording to the timestamp. The ﬁrst 33 months, i.e., data from Jan 2015 to Sept 2017 forms the trainingdataset and the succeeding data i.e. from Oct 2017 to Dec 2018 forms the test dataset for evaluating theproposed models. Table 2 shows the number of datapoints we deal with at every step of the model buildingand evaluation process. 4.

Model I/O

As mentioned previously, this study aims to develop supervised learning models that can “learn” the marketperceived pricing of option contracts, and give us the fair price of an option contract in accordance withpast market behaviour. In order to develop supervised machine learning models (refer Section 2), we needto train the models with a set of ‘inputs” and “outputs”. Sections 4.2, 4.3 and 4.4 describe the diﬀerentfeature sets, each of which we intend to use as inputs to the supervised learning algorithms. These featuresets are derived from the information available to market participants. Before describing each of the featuresets, we explain the desired format of the output variable which is kept uniform across all the approaches.4.1.

Categorical Output Variable.

As for the output of the proposed data-driven option pricing models,using the option contract prices obtained directly from the market would not be prudent. This is because,for contracts with a ﬁxed value of moneyness, the magnitude of contract parameters like “Strike” and “Spot”prices may vary over the years. It makes much more sense to create an output variable that is scale free. Wetherefore deﬁne the “output” as the ratio—expressed in percentage—of the Close price ( C ) and the Strikeprice ( K ) of the contract, i.e. we designate 100 × CK as the output variable. This ratio serves as a scale freeproxy to contract price for the model.Since the “output” variable is continuous, it is natural to formulate the problem using a regression model.However, since no real market is complete, a single predicted price of an option contract is more confusingthan convincing. Indeed the fair price could be anything in a certain interval. Determining this interval isa hard problem from both, the theoretical and the empirical aspects. Instead of ﬁnding such an intervalof the fair price, selecting the most likely interval from a pre-determined set of non-overlapping consecutiveintervals is fairly straight forward. One can divide the range of outputs into non-overlapping “bins” andselect the “embracing” bin as the output variable. However, a major hurdle in this approach is determiningthe width of the bin. Larger the width of each bin, lesser the usefulness of the model due to lack of precision.On the other hand a ﬁner binning confuses the model due to the presence of a certain degree of in-docileuncertainties in the option trading price, which can be attributed to the lack of completeness in the market.The most straightforward way to tackle this quandary is to formulate an optimization of an appropriate lossfunction. Instead of adopting such an objective approach which essentially depends on the type of data andthe model used, we ﬁrst introduce a binning insensitive performance measure for the models. We refer tothis measure as the EM. Subsection 5.1 (refer Equation (5)) gives a description of the proposed metric. Wethen study the values of EM obtained for diﬀerent bin widths, for a ﬁxed dataset and a ﬁxed model type. ANINDYA GOSWAMI*, SHARAN RAJANI, AND ATHARVA TANKSALE . . . . . . . . . . . . . . . Bin width E M EM vs. Bin Width

Figure 4.

Binning insensitivity of the performance measureDepending on the persistent stability of EM and the gain of precision, we decide the bin width to be used.Figure 4 shows the results of procedure used to determine the interval width. We observe that for bin widthintervals larger than 0.1, the supposedly bin-insensitive measure drastically decreases. This is expected aslarger bin intervals imply lesser number of classes, which makes classiﬁcation easier for the models due toincreased imprecision. For bin intervals lesser than 0.075, a certain monotonicity appears. But for bininterval width roughly between 0.1 and 0.075, the EM value behaves insensitive to binning. This manuscriptuses the value of bin width as 0 . n − w, nw ] is set as the n th bin where n is a natural number and w (here w = 0 .

1) is the bininterval width. This creates a set of equispaced bins allowing us to map option contracts to their respectivebins by computing the value of 100 × CK for the particular contract and assigning the corresponding integervalued bin number to it as its label. These labels are then considered as the ordinal output variables andare used to train and test the constructed models. We illustrate this binning in Figure 5. The ﬁgure is ahistogram (plotted using 0 . × CK values, for the ﬁltered NIFTY50 contract dataset.It is evident from the plot that there are just enough data points per bin and yet we have enough numberof categories, ie. bins, to make the model robust.The above procedure of binning is rather subjective and is not meant to be precise for a vital reason. Thereason being, any precise data-driven optimization depends upon the choice of the data and the model. Onthe other hand, we wish to ﬁx binning, regardless of the choice of the model or the dataset. The reason beingthat, binning deﬁnes the output variable in the training and test datasets for each model and we wish tokeep all the models comparable. Without identical binning, it would not be possible to combine or comparemodels trained on diﬀerent datasets.4.1.1. Remark.

The following subsections (4.2, 4.3 and 4.4) describe 3 separate, independent “approaches”used to generate feature sets that serve as inputs to the supervised learning algorithms described in Section2. Here the term “approach” is used to convey the motivation/idea behind generating the feature sets. The

ATA-DRIVEN OPTION PRICING USING SINGLE AND MULTI-ASSET SUPERVISED LEARNING 9 (Close/Strike * 100) values N u m b e r o f O b s e r v a t i o n s Figure 5. × CK values for NIFTY50 contracts plotted as a histogramfollowing subsections represent a crucial part of the present study. The components that make up each ofthe feature sets are summarised at the end of this section in Table 3.4.2. Approach I.

From the very deﬁnition of an option contract, it is known that the fair price of a calloption must depend on the values of the option contract parameters (like the strike price ( K ), the time tomaturity ( τ )), the risk free interest rate ( r ), the spot price ( S ) and the anticipated statistical behavior ofthe future dynamics of the underlying asset. The closest real world approximate of the value of r would bethe government bond yield. Amongst the parameters that are available to the practitioner, it is natural tohypothesize that the most important determinant of the option contract’s value is the present value of theunderlying security and the price dynamics followed by it over the past few days. Directly using the pastasset price data as features would make the values scale dependent, especially so when data over many yearsis to be considered for model training. As a means to resolve the scale dependency, log returns of the timeseries (henceforth, referred to as LR) are considered, the values of which are given byLR( S ) i = log( S i ) − log( S i − ) = log S i S i − (3)where, S i is the i th term of a time series S .In order to obtain a non-parametric inference of the recent distribution of log returns, we calculate the OrderStatistics of the log returns. This is done by computing the log returns of the daily close prices of the under-lying asset for a window of the past 20 trading days, as it corresponds to approximately a calendar monthexcluding all holidays. Following this, the Order Statistics is computed by simply arranging the log returnsin ascending order for each sample. To be more precise, if x ( i ) denotes the i th order statistics of a sampleof diﬀerent real values ( x , x , x , . . . , x n ), then x ( i ) = x j for some j = 1 , . . . , n , and x (1) < x (2) < · · · < x ( n ) hold.In view of the preceding discussions, we calculate the Order Statistics of historical log returns for each ofthe near-ATM option contracts resulting in a row of 22 features as given below:(1) The 19 log return order statistics. (2) The time to maturity ( τ ) of the option contract.(3) The interest rate r : We use the 3 month sovereign bond yield rates as an approximation for the riskfree interest rates.(4) Moneyness: This quantity is computed as SK (the ratio of Spot to Strike prices).A collection of such rows is what constitutes the train/test dataset.4.3. Approach II.

This subsection proposes a feature set that takes into account the market participant’saccess to other facets of the asset price data. Intuitively, a lot more information on asset dynamics can begleaned by taking into account the values of “Open”, “High”, and “Low” along with the values of “Close”(refer 6). However, this intuitive anticipation deserves a quantitative backing. Let us ﬁrst understand the

Figure 6.

Cross section of the underlying asset price datasetneed for a completely new “approach”. The previous subsection attempted to generate a feature set thatcaptures the empirical distribution of the “Close” price data of the underlying asset. The present subsec-tion seeks to remedy the fact that the asset price data obtained from the market consists of multiple facetsthat haven’t been accounted for in Approach I. The joint distribution of these four time series’ (“Open”,“High”,“Low” and “Close” ) cannot be inferred from the order statistics of every individual time series asthey are not independent. This renders a direct mimicking of Approach I ineﬀective. Moreover, using adirect extension of Approach I would lead to a feature set with 19 × O ), High( H ), Low ( L ), and Close ( C ), denoted by µ O , µ H , µ L , and µ C respectively and construct the covariancematrix Σ as Σ =  var( O ) cov( O, H ) cov(

O, L ) cov(

O, C )cov(

H, O ) var( H ) cov ( H, L ) cov(

H, C )cov(

L, O ) cov(

L, H ) var( L ) cov( L, C )cov(

C, O ) cov(

C, H ) cov(

C, L ) var( C )  . As Σ is symmetric, six entries on the upper triangular part are repeated in the lower part. We include thesquare root of entries of Σ in the feature set after discarding the repetitions. Thus we build the secondfeature set using the following 17 features:(1) Means of the log return series’; µ O , µ H , µ L , and µ C .(2) Ten statistics from Σ, namely (cid:26) Σ ij √ | Σ ij | | ≤ j ≤ i ≤ (cid:27) , where Σ ij is the ( i, j ) th element of Σ usingthe convention x √ | x | = 0 iﬀ x = 0.(3) Features (2) − (4) from Approach I. ATA-DRIVEN OPTION PRICING USING SINGLE AND MULTI-ASSET SUPERVISED LEARNING 11

Approach III.

Approaches I and II primarily utilize the underlying asset price data to derive the setof features. However, a market participant also has access to the historical option contract trade prices.It would be imprudent to not develop an approach that factors in this key aspect. In fact, including thehistorical option contract trade prices in an appropriate form would help the supervised learning algorithmsto develop abstract representations of market factors like implied volatility, allowing them to predict theoption contract price more accurately. We would like to stress on the fact that the intent of Approach III isto build upon the progress made in Approach I and II. We cannot use an extension of Approach I for reasonsmentioned previously. We instead, seek to augment the feature set developed in Approach II by adding thefeatures listed below to the feature set obtained from Approach II:(1) Previous Option Price (scaled): This is computed as C t − K where C t − is the previously reportedclose price of the option contract under study and K is the Strike price of the contract. Includingthis feature helps account for any auto-regressive characteristics that might be present in the optionprice data.(2) Mean Moneyness: Computed as ¯ SK , where ¯ S is the mean of the underlying asset prices (for a windowof the past 20 trading days) and K is the strike price of the contract.Table 3 summarizes the features used by the three approaches described in this section. Figure 7 presentsan overview of the steps that constitute the process of model building. Composition of Feature Sets: An overview

Approach 1 Approach 2 Approach 3

Non ParametricFeatures

Order Statisticsof the LR of theunderlying asset — —

ParametricFeatures — Mean LR of OHLCCov LR of OHLC Mean LR of OHLCCov LR of OHLC

ContractFeatures

MoneynessTime to Maturity MoneynessTime to Maturity MoneynessTime to MaturityPrev. Option Price (scaled)Mean Moneyness

Other

Interest Rate Interest Rate Interest Rate

Total

19 + 3 = 22 features features features

Table 3.

An overview of feature sets for all the Approaches5.

Model Performance

Performance Measures.

Once a model is trained, it is imperative to test the performance of themodel on data that has not been used for training (ie. the test dataset) and study the quality of thepredictions. The most common way to evaluate the predictions of nominal variables is to ﬁnd the value ofthe accuracy metric A , deﬁned as A = CT (4)where C is the number of correct predictions and T is the total number of predictions. It is however, notideal to use the accuracy metric for an ordinal output variable having a wide range. In such cases, onecan examine the quality of the incorrect predictions by measuring the distance between the actual and thepredicted classes. Doing so is meaningful because, it is desirable for a good model to be able to predicta class identical to or very close to the actual class. In contrast, the accuracy metric treats all incorrectpredictions in the same manner, regardless of whether the predicted class is close to or far from the actualclass. It is therefore important to come up with a metric that does a better job of informing us about the Figure 7.

Process Flowchartquality of the predictions made. We do so by proposing an

Error Metric ( EM ) given byEM = (cid:32) wT i = T (cid:88) i =1 | C i − P i | (cid:33) (5) Figure 8.

Visualizing the EM for a single predictionwhere, w denotes the binwidth , T is the number of contracts in the test dataset and the ordinal variables C i and P i denote the actual and the model predicted bin numbers respectively. As mentioned in Subsection4.1, we set the value of w as 0 .

1. Multiplying the bin number with the bin width makes EM asymptoticallyinsensitive to binning. We illustrate the implication of EM in Figure 8. Figure 8 gives an example of thecase where the distance between the actual and the predicted classes is 2. It can easily be proved that theEM converges to the Mean Absolute Error (MAE) as bin width tends to 0. However, the MAE metric isknown to be sensitive to outliers. Hence, in order to get a better insight into the performance of the models,we also consider an additional metric—the “inaccuracy metric”—that is robust to outliers. The “inaccuracymetric” ( ρ ) gives the probability of the predicted and actual bins to lie more than 2 bins apart. In otherwords, the metric ρ gives the probability that the model will fail to include the actual price bin (labelled as ATA-DRIVEN OPTION PRICING USING SINGLE AND MULTI-ASSET SUPERVISED LEARNING 13 C i ) in a band of ﬁve consecutive bins where the predicted bin (labelled as P i ) is in the middle. Henceforthwe refer to the above mentioned band as the predicted band (see Figure 9). The ρ metric is deﬁned as ρ := { i ∈ { , , . . . , T } | | C i − P i | > } T . (6)While EM is a measure of prediction imprecision, the empirical quantiles of the error C i − P i gives theconﬁdence interval of C i using prediction P i . In particular, 1 − ρ denotes the conﬁdence of C i being in[ P i − , P i + 2]. Figure 9.

Visualizing the predicted bin, the predicted band and the relationship between them5.2.

Models Trained with Single Sources.

The following paragraphs present the details of the perfor-mance of each model (in terms of EM and ρ ) built using the aforementioned approaches for the NIFTY50and BANKNIFTY datasets. NIFTY Index option data . Table 4 lists the EM and ρ values for all models that are trained and testedusing NIFTY50 data. The results reported in Table 4 convey that all trained models perform at par or better NIFTY50 Contracts EM ρ B-S Pricing 0.19 0.29

Trained Models ANN XGB ANN

XGB

Approach I 0.18 0.18 0.25 0.27Approach II 0.17 0.19 0.26 0.29Approach III 0.14 0.16 0.20 0.22

Table 4.

Model evaluation metrics for models trained and tested on NIFTY50 optionscontract price datathan the pricing formula of the Black-Scholes model (we use the historical volatility values observed over awindow of the past 20 trading days to compute the Black-Scholes price). We also note that in comparison toXGBoost, the use of ANN results in lower values of EM and ρ . Table 4 also shows that the values of the met-rics do not diﬀer signiﬁcantly between Approach I and Approach II. This indicates that the two supervisedlearning algorithms were unable to extract additional information on the asset dynamics from the ﬁrst twomoments of the Open-High-Low-Close (OHLC) data than solely from the Close price data. From the results,it is also clear that the performance of Approach III is far superior to that of Approaches I and II, which in-dicates that the historical option price data contains valuable information relevant to the current option price.It is evident from Table 4 that for all cases the EM value is less than 0 .

19. Loosely speaking, this impliesthat on an average, the predicted value of 100 × CK is not further than 0 .

19 from the actual (refer to Equation(5)). In other words, the diﬀerence between the actual and predicted option prices is on an average, less than0 . × K . A more precise statement in terms of conﬁdence interval can be made using the empirical quan-tiles (refer to Figure 10). The 2% and 98% quantiles of C i − P i obtained using the Approach I ANN model

10 8 6 4 2 0 2 4 6 8

C_i - P_i F r a c t i o n o f D a t a CDF for NIFTY50 Option Contracts (DL Model)

Figure 10.

Empirical CDF of C i − P i obtained using Approach I based ANN model onNIFTY50 option contractsfor NIFTY50 data are − C i − P i . Indeed, the ρ metric isuseful in this regard. To be more precise, the diﬀerence between the actual and predicted option price inter-vals is less than K with probability 1 − ρ (refer Equation (6)). Thus an interval of length K (predictedband) can be obtained from a model prediction which succeeds in containing the close price of the option(having strike price K ) with probability 1 − ρ (refer to Figure 9). We recall from Figure 5 that this predictedband width is less than one tenth of the full range of option prices for the NIFTY50 data under consideration.We consider Approach III ANN models to further illustrate the implication of the predicted bands. Forthis, we ﬁrst identify the upper and lower limit option prices of the band and compute the correspondingdaily implied volatility values for each contract. From these values, we obtain the daily averaged predictedimplied volatility band. We then compute the average market-realized implied volatility for each day usingnear-ATM options data and compare it with the predicted implied volatility band. A time series plot ofthat comparison is presented in Figure 11. The ﬁgure shows that for 90% of the time, the market-realizedimplied volatility lies within the predicted band. It is not surprising that this band prediction error is only0 .

10, a value that is much lesser than the ρ value for Approach III in Table 4. The main reason behind theobserved error reduction is the presence of averaging in the computation. This indicates the possibility ofbuilding a superior hybrid model by exploiting such an averaging eﬀect. However, we do not attempt tobuild such models in the present study. BANKNIFTY Index option data . Table 5 lists out the performance of the models that were trainedand tested on BANKNIFTY Index data. The data processing, feature-set generation and train-test splittingfor the BANKNIFTY options dataset is done in the exact same way as for NIFTY50 Index option data, inaccordance with the methodologies laid down in Sections 3 and 4. It can clearly be seen that the valuesof the EM and ρ are the lowest for Approach III models. The evaluation metrics for Approach III are alsolower than those for the Black-Scholes formula. Using the trained models and the results shown in Table 5, ATA-DRIVEN OPTION PRICING USING SINGLE AND MULTI-ASSET SUPERVISED LEARNING 15 - - - - - - - - - - - - - - - - Date I V V a l u e IV prediction Band for 2018 Data

Figure 11.

Average Empirical IV and the predicted IV Band, plotted for the NIFTY50 test datasetan analysis of the results similar to what has been done for NIFTY trained models can be performed, butwe avoid repetitive explanation.

BANKNIFTY Contracts EM ρ B-S Pricing 0.19 0.29

Trained Models ANN XGB ANN

XGB

Approach I 0.19 0.21 0.28 0.32Approach II 0.20 0.21 0.33 0.33Approach III 0.17 0.19 0.24 0.29

Table 5.

Model evaluation metrics for models trained and tested on BANKNIFTY optionscontract price dataFrom the results shown in Table 5 and Table 4, it is evident that Approach III ANN models performsigniﬁcantly better than all other proposed models. Furthermore, they are far more accurate than whatthe Black-Scholes formula can prescribe. Having said so, it is also important to recall that no measures ofvolatility has been fed into any of the proposed models. We also present a set of experiments that shows thepromise of ensemble modeling.5.3.

Ensemble Models.

The predictions of the two pricing models obtained using ANN and XGBoostfor each approach can be averaged out, to obtain a new prediction. We refer to this as the prediction ofa simple ensemble model. The rationale behind this approach is straightforward. It is plausible that fora particular approach, the XGBoost model learns a subset of the representation space very well, but doesnot learn it well enough for some other subsets. The ANN model could hypothetically learn those missedsubsets of representation space better than what the XGBoost model is capable of learning. By averagingout the predictions of the models, we seek to minimize the number of subsets over which the individualmodels perform poorly. Averaging the model predictions allow us a way to leverage the well learnt portions of the representation space of both the models at the same time.We evaluate the performance of the ensemble models by computing the EM and the ρ values for the testsets. Tables 6 and 7 present the model evaluation metric values for the ensemble models trained and testedon NIFTY50 and BANKNIFTY contracts respectively. The results in Table 6 and Table 7 show a markedimprovement in the EM values for all the approaches when compared to the results in Table 4 and Table 5respectively. Averaged Models :: NIFTY50 EM ρ Approach I

Approach II

Approach III

Table 6.

Model evaluation metrics for ensemble averaged models trained and tested onNIFTY50 option contracts

Averaged Models :: BANKNIFTY EM ρ Approach I

Approach II

Approach III

Table 7.

Model evaluation metrics for ensemble averaged models trained and tested onBANKNIFTY option contracts

Remarks :

It is important to note that the “predictions” ( P ) of the ensemble model need not be an integerclass label but could instead be an integer multiple of . However, no change is needed in the computationscheme of the model evaluation metrics.5.4. Models Trained with Multiple Sources.

Since the features and the output variable used are scalefree, models trained on one asset should be able to give reasonable option price predictions for another assetprovided their log return distributions are not too diﬀerent from each other. This anticipation hinges onour assumption that, for a given ﬁnancial market, two assets having the same return distribution shouldhave the same option pricing mechanism. Again, there is a possibility that the prediction quality may beinferior even though the training and test datasets belong to the same asset, as the return dynamics of theunderlying asset may have changed drastically. This subsection presents some experiments in this direction.We ﬁrst carry out an empirical investigation on the asset portability of the models. In order to do this weconsider all six models trained on NIFTY50 option contracts, and test them with data from BANKNIFTYbased contracts on non-overlapping time intervals. The results of this experiment are given in Table 8. Itis crucial to note that these two indices are suﬃciently independent and have contract parameters withvastly diﬀerent magnitudes. We present the Q-Q plot (Figure 12) of the “Close” price log returns of thetwo underlying assets in order to compare their log return distributions. Figure 12 shows a moderate mis-match between the return distributions of these two assets. Thus, although we do not expect the predictiveperformance to be equivalent to NIFTY50 test sets, we expect the error metric to be decently small inmagnitude. Our experiment supports this anticipation. However, a quick comparison of our results (Table8) with Table 5 shows that the NIFTY50-trained models outperform the BANKNIFTY-trained models forthe BANKNIFTY test set. This gives evidence of the fact that a model trained on a diﬀerent asset/sourcecan outperform a model trained on the target asset/source.The results of the above experiment encourages us to train the models using contract data from two ormore number of assets/sources. In principle, this should broaden the range of features and allow the models

ATA-DRIVEN OPTION PRICING USING SINGLE AND MULTI-ASSET SUPERVISED LEARNING 17

BANKNIFTY N I F TY Figure 12.

Q-Q plot for the log returns distributions of the “Close” prices of theBANKNIFTY and NIFTY50 indices

NIFTY50 models tested on BANKNIFTY EM ρ Trained Models ANN XGB ANN

XGB

Approach I 0.19 0.19 0.28 0.28Approach II 0.17 0.18 0.24 0.26Approach III 0.15 0.17 0.21 0.24

Table 8.

Model evaluation metrics for models trained on NIFTY50 contract data andtested on BANKNIFTY contractsto achieve far better generalization and predictive capability. We investigate this by training all six mod-els (one XGB and one ANN for each of the three approaches) using the combined data of NIFTY50 andBANKNIFTY contracts and then perform out-of-sample tests for each asset. The EM and ρ values of therespective experiments are given in Table 9. Experiments using models trained on combined datasets EM ρ Test Dataset ::

NIFTY50 BANKNIFTY NIFTY50 BANKNIFTY

B-S Pricing 0.19 0.19 0.29 0.29

Experiment Type ANN XGB ANN XGB ANN XGB ANN XGB

Approach I 0.17 0.17 0.18 0.19 0.24 0.24 0.25 0.25Approach II 0.17 0.18 0.19 0.19 0.25 0.28 0.28 0.28Approach III 0.14 0.16 0.16 0.17 0.17 0.22 0.23 0.23

Table 9.

Model evaluation metrics for models trained on both NIFTY50 and BANKNIFTYcontract data

A comparison of the metrics given in Table 9 with those in Tables 4 and 5 clearly shows that combined-trainedmodels have better option pricing capabilities than the models trained on the respective assets individually.Each of the combined-trained models also outperform the price prescription of the Black-Scholes formula.The performance of the option price prediction can be better perceived using a scatter plot of the actual andpredicted option prices, which we present in Figure 13. Since the proposed models predict a bin, in order toplot the graph (Figure 13) we take the mid point of the predicted bin to get a single predicted price. Theprices obtained using the mid point of the bins are plotted along the horizontal axis and the actual price isplotted along the vertical axis in the scatter plot. The scatter plot shown in Figure 13 is constructed usingthe predictions given by the Approach III ANN model (trained using combined data). To the plot, we addthe line y = x (dashed red) and the orthogonal regression line (dashed green). The proximity of these twolines validates the absence of bias in the model. In principle, such scatter plots can be constructed for allthe proposed models. Average Predicted Price A c t u a l P r i c e Combined Model : Approach 3 (ANN), Actual vs Predicted Price

Figure 13.

Actual vs ‘Predicted’ Price (obtained using the Approach III ANN modeltrained on both NIFTY50 and BANKNIFTY contract data)The success of the above experiment warrants an in-depth explanation. In the next section, we use theconcept of domain adaptation for designing a methodology that provides a deeper understanding of thecombined-training eﬀect. 6.

Introspection of Combined trained Models

This section brings to fore an interesting application of the models constructed using Approach I in the Sec-tions 5.2 and 5.4 respectively. We test the pre-trained models (obtained using Approach I) with simulated

ATA-DRIVEN OPTION PRICING USING SINGLE AND MULTI-ASSET SUPERVISED LEARNING 19

Black-Scholes option price data. A family of such tests is conducted by varying the volatility parameter in theGeometric Brownian motion that is used to generate the simulated asset price time series’; these time seriesdatasets are then augmented with the option prices prescribed by the Black-Scholes formula. We recall fromSection 4.2 that Approach I based models use Order Statistics of the log returns of the underlying asset’sdaily close prices as their primary inputs. Thus Approach I can be directly used to generate the simulatedtest datasets by considering the simulated time series data as “Close” prices. But Approach II or ApproachIII cannot be used directly, as they use “Open”, “High” and “Low” time series’ along with the “Close” timeseries to generate the features, and simulating the corresponding “Open”, “High” and “Low” time series’is not straightforward. Hence we only use Approach I based models for the experiments described in thissection.We simulate Geometric Brownian motion with the drift parameter set at µ = 0 . ∈ [10 , , . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . sigma E M v a l u e All Model comparison

NF50BNFCOMBINED

Figure 14.

EM vs σ curve for ANN single- and combined- trained models in Approach IWe recall from Section 5.4, that the success of a model on a test dataset depends on the proximity betweenthe return distributions of the test set and the model’s training set. Hence, it is natural to expect that thequality of the trained model’s predictions will vary when the model is tested with the range of simulateddatasets with varying volatilities. We anticipate that the plot of the error metric (obtained for each of thetest sets) against the value of the test set’s volatility parameter would result in a V shaped curve as the errormetric would be large for a test set having a higher mismatch with the training set in terms of the return . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . sigma E M v a l u e All Model comparison : XGB

NF50BNFCOMBINED

Figure 15.

EM vs σ curve for XGBoost single- and combined- trained models in Approach Idistributions. This is clearly observed in Figure 14 and 15.The minimizing volatility values of EM provide a class of theoretical asset dynamics whose option prices arebest predicted by the trained model. We call it the “Error Minimizing Volatility” or EMV of a given optionprice dataset corresponding to the learning model. For example from Figures 14 and 15, it can be observedthat the EMV of NIFTY50 data for the ANN as well as the XGBoost model is 0 .

13. On the other hand theEMV of BANKNIFTY data for the ANN and XGBoost models are 0 .

19 and 0 .

20 respectively. ExaminingTable 10 shows us that the values of EMV obtained are not mere coincidences, but are in fact related tothe training dataset used. Indeed the EMV is signiﬁcantly close to the volatility parameter values of thetraining dataset.

Implied Volatility Historical Volatility

Mean Median ModeNIFTY50

BANKNIFTY

Table 10.

IV and Historical Volatility values for NIFTY50 and BANKNIFTY indicesFrom Figures 14 and 15 it is evident that the EM plot obtained for the combined-trained models give alower and ﬂatter V shape curve. This implies that models trained on the combined dataset result in lowerEM values for a wide range of test sets having varying σ values. This hints at the possibility of domainadaptability of predictive models trained on datasets derived from multiple assets/sources. It also hints atthe existence of a common representation space for datasets with similar log return distributions. Such anapplication of domain adaptability can be a very powerful method, as it could potentially aid research inareas where data is scarce. ATA-DRIVEN OPTION PRICING USING SINGLE AND MULTI-ASSET SUPERVISED LEARNING 21 Model performance on 2019-2020 data

During the period from January 2020 to April 2020 of the COVID-19 pandemic, the dynamics of the NIFTY50Index were radically diﬀerent from its usual dynamics. A Q-Q plot comparison of the log return distributionsof the NIFTY50 Index during the periods Oct’19-Dec’19 and Jan’20-Mar’20 is shown in Figure 16.

Jan_Mar_(2020)_NIFTY50 contracts Log Returns O c t _ D e c _ ( ) _ N I F TY c o n t r a c t s L o g R e t u r n s Oct-Dec & Jan-Mar Q-Q Plot :: Recent NIFTY50 contracts

Figure 16.

Q-Q plot of (Oct 2019 - Dec 2019) and (Jan 2020 - Mar 2020) datasetsIt is evident from the Q-Q plot that there seems to be almost no match between the price dynamics ofthese two time intervals. Therefore for option contracts based on the NIFTY50 index, we cannot expectthe models trained on 2015 − − − − Testing Recent Data (Approach III) EM ρ B-S Pricing

Train Dataset ANN XGB ANN XGB

NIFTY50

Combined Dataset

Table 11.

Model evaluation metric for models trained on 2015-2017 NIFTY50 contractdata but tested on 2019 NIFTY50 contract dataTable 11 makes it evident that the error in predicting option prices for 2019 − − a large gap between the historical and implied volatilities. This is typically observed when drastic changesoccur in a ﬁnancial market. We also observe a signiﬁcant improvement in case of the combined-trained mod-els as compared to the individually-trained NIFTY50 models. This reaﬃrms the power of combined training. - - - - - - - - - - - - - - - - Date I V V a l u e IV prediction Band for 2019 Data

Figure 17.

Average IV and the IV band for 2019 NIFTY50 index dataIn addition to the above experiments, we plot the empirical IV and the predicted IV band in Figure 17 ina manner similar to the plot reported in Figure 11. The band prediction error for the 2019 − ρ observed in Table 11. Figure 17 helps us identify regionsin the test dataset where the model does not perform well. It is observed that when the implied volatilityof the underlying asset changes sharply, the prediction bands deviate from the actual values. These abruptchanges are usually caused by rapid changes in the market sentiment (in this case due to the COVID-19pandemic); an aspect that is not represented in the data used to train the models.8. Conclusion

In this paper, we present three data-driven approaches to build option pricing models using supervised learn-ing algorithms. These approaches are illustrated for two diﬀerent assets/sources (NIFTY50 and BANKNIFTY),and we use two diﬀerent learning algorithms to build a range of models. Upon evaluating the performanceof the models on out-of-sample data, it was seen that Approach I and II based models performed betterthan the Black-Scholes option pricing formula in most cases, while the Approach III based models performedsigniﬁcantly better than all comparative models. Since Approach III uses features derived from the historical

ATA-DRIVEN OPTION PRICING USING SINGLE AND MULTI-ASSET SUPERVISED LEARNING 23 option price data that are not present in the Approach I and II based feature sets, the performance improve-ment clearly indicates the vitality of including such information. The results also highlight the superiorperformance of ANN-based models in comparison to the XGBoost-based models. In this paper, we havealso attempted to build averaging ensemble models for each data source; the results of which clearly showsan unprecedented level of accuracy in pricing option contracts. Lastly, we have investigated the eﬀect ofmulti-asset combined training for each of the proposed approaches. It was observed that the multi-assettrained models gave us a signiﬁcant improvement in the prediction quality when compared to single-assettrained models. We have further examined this performance enhancement by using the concept of domainadaptation.The success of the multi-asset trained models makes us optimistic about the viability of building a non-asset-speciﬁc data-driven option pricing model. Such a modelonce trained on data from multiple assets belongingto a particular ﬁnancial marketwould be capable of predicting the fair price of any European-style call optionon any asset belonging to the same ﬁnancial market with a high degree of precision. However, in our paper,we have examined the combined-training eﬀect using only two assets/sources. Extensive experimentation isrequired to determine the limitations and the scope of such non-asset-speciﬁc models. Readers may refer to[21] which reports a similar extensive experiment to study some other universal non-asset-speciﬁc relationscaptured by a deep learning model. Further research to develop and validate the existence of such modelshas been planned by the authors. The codes used in this study can be made available on request.

Acknowledgement

We are grateful to Arkaprava Sinha and Prof. Amit Mitra (IIT- Kanpur) for some useful discussions.

References [1] Amilon, H. A Neural Network versus BlackScholes: a comparison of Pricing and Hedging Performances. Journal of Fore-casting (2003); Vol. 22, no. 4, pp. 317 335.[2] Bennell, J.; Sutcliﬀe, C. Black-Scholes versus artiﬁcial neural networks in pricing FTSE 100 options. Intelligent Systemsin Accounting, Finance & Management (2004); Vol. 12, no. 4, pp. 243260.[3] Black, F.; Scholes, M. The pricing of Options and Corporate Liabilities. Journal of Political Economy (1973); Vol. 81; pp.637-54.[4] Breiman, L; Friedman J. H.; Stone C. J.; Olshen, R. A. Classiﬁcation and regression trees. CRC Press (1984).[5] Carelli, A; Silani, S.; Stella, F. Proﬁling Neural Networks for Option Pricing. International Journal of Theoretical andApplied Finance (2000); Vol. 03, No. 02, pp. 183-204.[6] Tianqi, C; Guestrin, C.; XGBoost: A Scalable Tree Boosting System. KDD ’16: Proceedings of the 22nd ACM SIGKDDInternational Conference on Knowledge Discovery and Data Mining (2016); pp. 785-794.[7] Das, S,; Padhy, S. A new hybrid parametric and machine learning model with homogeneity hint for European-style indexoption pricing. Neural Computing and Applications (2016); Vol. 28, pp. 40614077.[8] Freund Y.; Schapire R.E. A decision-theoretic generalization of on-line learning and an application to boosting. (EuroCOLT)Lecture Notes in Computer Science (Lecture Notes in Artiﬁcial Intelligence) (1995), vol 904.[9] Friedman, J. H. Greedy function approximation: a gradient boosting machine. Annals of Statistics (2001). pp. 1189-1232.[10] Friedman, J. H. Stochastic gradient boosting. Computational statistics & data analysis. Vol. 38, No. 04, pp. 367-378.[11] Garcia, R.; Genay, R.. Pricing and Hedging Derivative Securities with Neural Networks and a Homogeneity Hint. Journalof Econometrics (2000); Vol. 94, issue. 1-2, pp. 93115.[12] Gradojevic, N.;Gencay, R.; Kukolj, D. Option Pricing with Modular Neural Networks. IEEE Transactions on NeuralNetworks (2009); Vol. 20, no. 4, pp. 626637.[13] Genay, R.; Qi, M. Pricing and Hedging Derivative Securities with Neural Networks: Bayesian Regularization, EarlyStopping and Bagging. IEEE Transactions on Neural Networks (2001); Vol. 12, no. 4, pp. 726734.[14] Goodfellow, I.; Bengio, Y.; Courville A. Deep Learning (2017), MIT Press[15] Hutchinson, J. M.; Lo, W. A.; Poggio, T. A Nonparametric Approach to Pricing and Hedging Derivative Securities viaLearning Networks. Journal of Finance (1994); Vol. 46, no. 3, pp. 851-889.[16] Kingma D. P.; Ba. J. Adam: A Method for Stochastic Optimization. International Conference for Learning Representations(ICLR) (2015). Accessed from arXiv at the url https://arxiv.org/abs/1412.6980 .[17] Lajbcygier, P; Boek, C; Palaniswami M; Flitman, A. A hybrid neural network approach to the pricing of options. Proceed-ings of ICNN’95 - International Conference on Neural Networks, Perth, WA, Australia, 1995; pp. 813-817 vol 2.[18] Maddala, GS.; Qi, M. Option pricing using Artiﬁcial Neural Networks: the case of S&P 500 Index Call Options. Neuralnetworks in Financial Engineering; Proceedings of 3rd International Conference On Neural Networks in the Capital Markets,World Scientiﬁc (1996); pp. 78-91.[19] Malliaris, M.; Salchenberger, L. Using neural networks to forecast the S&P 100 implied volatility. Neurocomputing (1996);Vol. 10, no. 2, pp. 183-195. [20] Malliaris, M.; Salchenberger, L. A Neural Network Model for Estimating Options Prices. Applied Intelligence (1993); Vol.3, pp. 193206.[21] Sirignano, J; Cont, R. Universal features of price formation in ﬁnancial markets: perspectives from deep learning. Quanti-tative Finance (2019); vol. 9, pp. 14491459.[22] Yao, J.; Li, Y.; Tan, C. L; Option Price Forecasting using Neural Networks. Omega (2000); Vol. 28, no. 4, pp. 455466.

IISER Pune, India

E-mail address: [email protected]

Orange Quant Research LLP, Pune, India

E-mail address: [email protected]

IISER Pune, India