[PDF] A Data-driven Market Simulator for Small Data Environments

Abstract

Neural network based data-driven market simulation unveils a new and flexible way of modelling financial time series without imposing assumptions on the underlying stochastic dynamics. Though in this sense generative market simulation is model-free, the concrete modelling choices are nevertheless decisive for the features of the simulated paths. We give a brief overview of currently used generative modelling approaches and performance evaluation metrics for financial time series, and address some of the challenges to achieve good results in the latter. We also contrast some classical approaches of market simulation with simulation based on generative modelling and highlight some advantages and pitfalls of the new approach. While most generative models tend to rely on large amounts of training data, we present here a generative model that works reliably in environments where the amount of available training data is notoriously small. Furthermore, we show how a rough paths perspective combined with a parsimonious Variational Autoencoder framework provides a powerful way for encoding and evaluating financial time series in such environments where available training data is scarce. Finally, we also propose a suitable performance evaluation metric for financial time series and discuss some connections of our Market Generator to deep hedging.

Full PDF

AA DATA-DRIVEN MARKET SIMULATORFOR SMALL DATA ENVIRONMENTS

HANS B ¨UHLER, BLANKA HORVATH, TERRY LYONS, IMANOL PEREZ ARRIBAS, AND BEN WOOD

Abstract.

Neural network based data-driven market simulation unveils a new and ﬂexible wayof modelling ﬁnancial time series, without imposing assumptions on the underlying stochasticdynamics. Though in this sense generative market simulation is model-free, the concrete mod-elling choices are nevertheless decisive for the features of the simulated paths. We give a briefoverview of currently used generative modelling approaches and performance evaluation met-rics for ﬁnancial time series, and address some of the challenges to achieve good results in thelatter. We also contrast some classical approaches of market simulation with simulation basedon generative modelling and highlight some advantages and pitfalls of the new approach. Whilemost generative models tend to rely on large amounts of training data, we present here a gen-erative model that works reliably even in environments where the amount of available trainingdata is notoriously small. Furthermore, we show how a rough paths perspective combined witha parsimonious Variational Autoencoder framework provides a powerful way for encoding andevaluating ﬁnancial time series in such environments where available training data is scarce.Finally, we also propose a suitable performance evaluation metric for ﬁnancial time series anddiscuss some connections of our Market Generator to deep hedging.

Contents

1. Introduction 2Generative Modelling and Market Generators 3Practical use and applications of Market Generators 3Some approaches to numerical data simulation in ﬁnance: Classical and new 42. Challenges of ﬁnancial time-series simulation: Classical and new 52.1. A reminder of speciﬁc stylised facts and evaluation metrics 72.2. Challenges for similarity metrics for ﬁnancial time-series by generative models 82.3. On signatures, their advantages in encoding path valued data, and their meaning 103. Main results: Our methodology and its background and motivation 123.1. Methodology: An overview of the main steps 133.2. Background and explanations to our generative modelling methodology 144. Numerical Results 184.1. Numerical experiments with historical data of S&P 184.2. Numerical experiments with synthetic paths from rough volatility models 215. Conclusions 22Appendix A. Signatures 22A.1. Signatures and their properties 22A.2. Lead-lag transformation 23Appendix B. Variational Autoencoders 24References 25

Date : June 26, 2020.1991

Mathematics Subject Classiﬁcation. a r X i v : . [ q -f i n . S T ] J un HANS B¨UHLER, BLANKA HORVATH, TERRY LYONS, IMANOL PEREZ ARRIBAS, AND BEN WOOD Introduction

Recent advances of deep learning applied to quantitative ﬁnance [5, 10, 12, 17, 33, 35, 57, 67]have demonstrated the potential prowess of deep learning algorithms in the context of pricing andhedging derivatives. While the neural network based hedging engine presented in [10] rendereda convincing hedging performance when the market scenarios provided for training came fromclassical stochastic models (such as the Black-Scholes and Heston models including various formsof market frictions), the approach is in fact inherently model agnostic. The collection of samplepaths provided to the network in the training phase enable the latter to generate optimal responsestrategies with respect to market distributions resembling the ones presented during the trainingphase. This and similar model-agnostic neural network based ﬁnancial applications drove interesttowards such market scenario generators that are ﬂexible and highly realistic, ideally even model-free and directly data driven. Indeed, the accurate modelling and eﬃcient numerical simulation ofstochastic market movements and ﬁnancial time series has been a central theme throughout thepast decades of ﬁnancial modelling. In practice however, even an overly complex model familydoes not necessarily include the target function or the true data-generating process (or a closeapproximation thereof) when the latter evolves over time. In this work we take a leap beyondclassical stochastic models and present a statistically driven market simulator based on generativemodelling: Lately, tremendous progress has been made in training neural networks as powerfulfunction approximators through backpropagation. This progress has brought about frameworkswhich can use backpropagation-based function approximators to build generative models. Suchmodels are based on the idea of transforming samples of latent variables to data samples viadiﬀerentiable functions, which are approximated by a neural network. Indeed, the emergence ofgenerative modelling techniques in various machine learning applications opened up new hori-zons for even more ﬂexible and directly data-driven simulations of market paths, however thesepossibilities come along with a new set of challenges which brings new unexplored questions tolight that we endeavour to identify and address in this paper. The contributions presented inthis work are the following: We develop a powerful, ﬂexible, non-parametric generative model forﬁnancial time series based on signatures of paths, which is capable of eﬃciently operating in anenvironment where a very low amount of training data is available. We demonstrate the prowessof this algorithm and its ability to produce realistic synthetic data to a high precision—that canbe conditioned on various market indicators—in various numerical examples on historical-, as wellas on simulated data.The paper is organised as follows: We motivate our work by outlining in Section 1 how potentialapplications of generative modelling based market simulation (market generators) can go beyondthe currently widespread standard applications of numerical simulations of market paths. Theseapplications incldude data anonymisation techniques, or the identiﬁcation of relevant properties(anomalies and outliers) of available data, as well as sophisticated backtesting and risk manage-ment strategies. Section 1 contains a brief summary of such applications, for a more detailledstudy see [45]. In Section 2 we contrast the main diﬀerences of traditional path generation of sto-chastic processes with the generative approach and discuss new challenges that arise if numericaltime series are simulated by generative models rather than traditional means. We also point outhow some of the traditionally common indicators describing if a ﬁnancial time series is a realisticreﬂection of market paths may— for generative models—not be fully suﬃcient as performanceevaluation metrics. To address the aforementioned new set of challenges we develop an approachthat uses the signatures of historical path segments as input data. The concept of signatures wascoined in the context of

Rough Paths (see [23, 47, 49]) and provides an eﬃcient and parsimonioustool kit to encode the most essential information contained in (continuously observed or discretelysampled) market paths.In Section 3 we present our main results and modelling setup. Section 3.1 gives a brief overview ofour general methodology summarized in ﬁve main steps and Section 3.2 provides more details onthe modelling choices in each of the steps. To demonstrate the eﬃciency of our proposed approachnumerically in Section 4, we generate synthetic market scenarios using variational autoencodersboth in the signature-based setting and in the standard returns-based setting. The superiority

DATA-DRIVEN MARKET SIMULATOR FOR SMALL DATA ENVIRONMENTS 3 of the signature based framework is displayed both from a computational and from a theoreticalpoint of view. In order to address the non-stationary nature of ﬁnancial markets, in our frameworkthe generated new market scenarios can be conditioned on various market indicators. The latterconditioning can then also be used to build sample paths of arbitrary length (see Section 3.1 (

Step3 ) and Section 4). It is apparent in the majority of (deep) neural network learning based pricing,hedging, forecasting or even generative algorithms that these applications often heavily rely onthe availability of large training datasets, which are not always readily available. The particularachievement presented in this work is to remove this limitation: We demonstrate the ability of ourapproach to generate new samples from a particularly small observable dataset to a high precision,which can in turn be conditioned on speciﬁc market indicators and conditions and used to feedother applications with the necessary amount of training samples. The algorithm developed inthis paper is available in the Github repository provided with this paper Github:Marketsimulator.

Generative Modelling and Market Generators.

The emergence of DNN-based ﬁnancialapplications is one of the driving factors that directed the interest towards highly realistic marketsimulators: A key factor for training these deep networks to a suﬃcient accuracy is the theavailability of suﬃciently large, representative training datasets. In the example of deep hedgingit is easy to see that the quality and quantitiy of available training data impacts the hedgingperformance, since unrealistic training data can lead to large losses (resulting from wrongly hedgedpositions) when the algorithms are subsequently applied to real life data. Though the surge ofmachine learning came hand in hand with the explosion of available data, in more situations thannot, the amount of available training data is insuﬃcient rather than large or the data availablefor training is not representative of the market, and numerical generation of additional data isnecessary. Similarly, failing to train on unseen rare events may leave the application without anadequate response strategy in case the market moves to unexplored territories. This gave rise toan increase in interest for realistic numerical simulation of ﬁnancial markets.Deep neural networks provide a powerful tool to approximate complex distributions and thiscapacity, together with the increases in available computational power and speed has opened newhorizons in all areas of modelling, including market simulation. The fundament of the prowess islaid by the universal approximation properties of neural networks [36, 37], which establish thatany function or distribution with suﬃcient regularity can be approximated by a suﬃciently largeneural network.

Generative models capture probability distributions by approximating these vianeural networks from learned samples, from which new synthetic data samples can be drawn.The generative model is trained such that it is representative the underlying distribution of thegiven dataset with respect to some loss function referred to as performance evaluation metric .Generative modelling originates from more traditional applications of machine learning and theadaptation of these techniques to ﬁnancial setting has its bespoke challenges. Identifying andaddressing a these challenges is one of the contributions of this work.

Deﬁnition 1.1 (Market Generator) . We refer to generative models in a ﬁnancial time seriescontext as Market Generators. That is, to neural networks that are designed to approximate theunderlying distribution of an underlying market from a data sample given in form of a time series,so as to generate new data variations of the learned distribution.Possible applications that call for generative simulation of ﬁnancial markets include:

Practical use and applications of Market Generators.

There are several situations in whichit is beneﬁcial to rely on simulated data samples that are statistically indistinguishable from agiven original dataset. This section outlines a very brief summary of some potential applicationsof generative modelling based market simulation (market generators) that can go beyond thecurrently widespread standard applications of numerical simulations of market paths. For a moredetailed study see [45] and [33] for an application to anomaly detection.(i) For data anonymisation: When the available data is conﬁdential, it is desirable to generateanonymised datasets that are representative of the true underlying distribution of the databut cannot be traced back to their origin. Financial data and medical data are often

HANS B¨UHLER, BLANKA HORVATH, TERRY LYONS, IMANOL PEREZ ARRIBAS, AND BEN WOOD proprietary, or conﬁdential. When testing investment strategies or the eﬀectiveness of atreatment it is imperative not to be able to trace back the datasets to the individual clientor patient.Scenario (i) already showcases some of the essential challenges in this context: Evaluating whetherthe produced data is representative of the distribution that the observed data stems from, de-pends on the distributional properties (evaluation metrics) that we control for. Also the levelof anonymity achieved by this procedure is a highly interesting question on its own. The studypresented in [45] is devoted to understanding these questions in more detail.An even more chal-lenging situation arises if the size available data sample to train the generative model is very smallto begin with:(ii) Small original training datasets: When there are natural restrictions on the number ofavailable original samples (constraints on number of experiments, legal restrictions on theaccess to data), the available training data may not be suﬃcient to train the neural networkapplication at hand, e.g. the hedging engine. Clearly, the more complex the application,the more data samples are needed to train it. In such cases also the training of generativemodels is challenging, due to the low number of available original samples. In this case,the generative neural network faces the same challenges, it has to be trainable on a verylow number of data samples. Generative models for sparse data environments thereforeneed to be as parsimonious and easily trainable as possible.Once such a generative network is available, the more complex neural network applications can alsobe trained, using the intermediate step of the market generator that produces the necessary amountof training samples for the latter, which are statistically indistinguishable from the original data.Further practical applications of market generators include (but are not limited to) the followinguse cases:(iii) Backtesting: When developing a trading strategy, carrying out a backtest to measure howthe strategy would perform in a realistic environment is of crucial importance. However,using historical data may result in overﬁtting of the trading strategy. Having a marketsimulator capable of generating realistic, independent samples of market paths would allowa more robust backtest less prone to overﬁtting.(iv) Risk management of portfolios – be it of ﬁnancial derivatives or trading strategies – is ofutmost importance. A realistic market simulator can be used to generate synthetic pathsto estimate various risk metrics, such as Value at Risk (VaR).In this paper we present a powerful generative modelling technique for time series generation in thesituation that is speciﬁcally designed for the type of sparse data environments (ii) described above:Financial time series generation, when the number of available original samples is notoriously small.

Some approaches to numerical data simulation in ﬁnance: Classical and new. • Classical modelling:

Numerical simulation of ﬁnancial time series has a long history inrelated literature, far preceding the recent surge in ﬁnancial machine learning research:(i) Classical approaches include for example classical stochastic market models andautoregressive models and variations of these. Among their advantages are theirtractability and the several decades worth of experience in understanding their math-ematical properties. Clearly the advantage thereof is a more straightforward suitabil-ity to currently prevalent risk-management frameworks. Disadvantages may includea relative inﬂexibility, which can result in modelling inconsistencies.(ii) Given the increase in computational powers today, more modelling ﬂexibility can begained within the realm of “models with a classical ﬂavour” by adding complexityand further parameters to the models, taking their weighted averages with weightscalibrated to the current market conditions, in the spirit of [20]. This line of thoughtopens new avenues that interpolate between model-based and model-free approaches[3], with their own set of challenges which we will not follow further in this paper.Such an experiment is presented in [35], similar ideas are further developed in [63].

DATA-DRIVEN MARKET SIMULATOR FOR SMALL DATA ENVIRONMENTS 5

With several decades worth of understanding the asymptotic and stochastic properties ofmodels and with the steady increase of available computational powers, the evolution of(classical) models moved toward more and more complex and realistic models. Neverthe-less, in recent years we have witnessed several situations where classical models have beenchallenged by market reality. As one of the consequences, the failure of classical modelsto fully explain the behaviour of asset prices has led to situations where these modelshave failed to prescribe the right response strategies. A reality that quantitative investorsare painfully aware of. While classical modelling techniques nevertheless undoubtedlycontinue to have their merits, ML-based technologies oﬀer a possible alternative to moreclosely mimic the behaviour of markets in more ﬂexible data-driven ways. • Data-driven modern generative modelling:

Approaches to generative modelling arebased on the common principle of generating new synthetic data samples whose distri-bution resembles the distribution of some reference dataset. One of the most strikingdiﬀerences of modern generative modelling to classical generation of synthetic data is thatthe explicit knowledge of the underlying data generating distribution is no longer required.Therefore, instead of implementing (an approximaion of) some known distribution or tran-sition density, generative models often approximate the underlying distribution implicitly,by drawing samples from the latter and comparing their similarity to the original datasetwith respect to certain similarity metrics. This is in particular true for so called diﬀerentialgenerator networks where a transformation map is learned through backpropagation froman initial source of randomness to a target distribution. The two most commonly usedgenerative diﬀerential network-based approaches are Variational Autoencoders (VAE) andGenerative Adversarial Networks (GAN) and variations thereof.A leap away from strictly classical modelling but directly generalising these is the work[51] which in a special case simpliﬁes to the Heston Model and in another case to aGARCH(1,1) model. Kondratyev and Schwarz propose in [44] a restricted Boltzmann Ma-chine (RBM) for time-series generation, controlling for the autocorrelation function andquantiles of the generated time series. Boltzmann Machines are among the ﬁrst generativemodels introduced to learn arbitrary distributions over binary vectors. Recent contribu-tions to this stream of literature using GANs as generative models include the following:[8, 12, 32, 64, 65, 66]. To date we are not aware of approaches using VAEs for this purpose.GANs are unquestionably the most popular diﬀerential generator networks, though they are typi-cally data-hungry, and it is often diﬃcult to guarantee their convergence and stability. VariationalAutoencoders maximize the likelihood of observing the given (original) data samples under thegenerated samples (2.3) and are particularly well-adapted to the presented scarce data envi-ronment. Recent theoretical connections between autoencoders and latent variable models haveindeed brought autoencoders to the forefront of generative modeling (see [26]).Therefore, in this paper we focus on generative modelling based on Variational Autoencoders (andtheir conditional reﬁnement) and highlight their prowess in the context of ﬁnancial time-series gen-eration. More details can be found in Section 3.2 and Appendix B.2. Challenges of financial time-series simulation: Classical and new

Currently, many of the available neural network-based generative models originate from staticapplications (such as image processing) and therefore, several available performance evaluationmetrics for generative models have been developed to measure some form of marginal distribu-tions. The incorporation of of a time-series aspect of the data poses additional challenges, one ofwhich is that these static performance evaluation metrics may not always be straightforward to The variational autoencoder approach is elegant, theoretically pleasing and simple to implement [26, Chapter20].

HANS B¨UHLER, BLANKA HORVATH, TERRY LYONS, IMANOL PEREZ ARRIBAS, AND BEN WOOD generalise to time series data (see more about this in Secion 2.1.1). Simulated ﬁnancial time seriesdata is commonly addressed by capturing speciﬁc universal features of the time series, commonlyreferred to as stylized facts . Below we recall a number of stylised facts that traditional stochasticmodels are typically aimed to reﬂect. Though for classical stochsatic models, these stylized factsare often formulated in terms of the distribution of returns, this returns-based viewpoint (thoughstill interesting) may not be the ideal choice to convey a suﬃciently full picture for distributions ofsynthetic market paths that come from market simulators using generative modelling. Some fur-ther issues arising from dealing with speciﬁc properties of sequential data in general are presentedin [54] (recalled below for convenience). These can be addressed in a uniﬁed manner by usingsignatures [23, 47, 49], which is the method of choice for generative modelling that we advocate inthis paper. In Section 2.3 below we collect some compelling reasons to do so and also argue whya signature-based approach is tailor-made for ﬁnancial applications such as pricing and hedging.We also indicate how a signatures-based objective function can be translated hedging performanceof portfolios (Section 2.2.1).

Problem formulation.

Let us ﬁrst ﬁx some notations that will be used throughout the paper.In the following, let S ( t ) denote the price of a ﬁnancial asset (stock, exchange rate or index) andlet X ( t ) = ln S ( t ) the corresponding log price. Then the log return (at scale t ) is denoted as r ( t, ∆ t ) := X ( t + ∆ t ) − X ( t ) , (2.1)where the time scale ∆ t (which we specify in Section 3.1 for our methodology and experiments)has a scale ranging from a day to a month Autocorrelation for a time lag τ > (cid:0) r ( t + τ, ∆ t ) , r ( t, ∆ t ) (cid:1) . (2.2)A numerical generation with an increased emphasis on the accuracy of the data generating processthen aims at generating synthetically M ∈ N returns-sequences of length k for a suitable k ∈ N ( (cid:102) r ( t , ∆ t ) . . . (cid:102) r ( t k , ∆ t )) , . . . , ( (cid:102) r M ( t , ∆ t ) . . . (cid:102) r M ( t k , ∆ t )) , (2.3)such that the generated set of k -sequences ( (cid:102) r i ( t , ∆ t ) . . . (cid:102) r i ( t k , ∆ t )) of returns reﬂects the prop-erties of the observed k -sequences of returns( r ( t , ∆ t ) . . . r ( t k , ∆ t )) , . . . , ( r N ( t , ∆ t ) . . . r N ( t k , ∆ t )) , (2.4)as accurately as possible, where N ∈ N denotes the number of observations in the original dataset. Small data environments:

Note that typically, the number of generated samples M (cid:54) = N neednot be identical to the number of original samples N . A small data environment (see scenario (ii)in Section 1) would correspond to the situation where M >> N , with N relatively small froma statistical or from a standard deep learning perspective . Such an example is the daily stockdata or of leading indices (S&P500, DAX, FTSE) where the number of data samples (datasetcovers 5000 days worth of daily data) available for training is of orders of magnitude smaller thanthe amount of data normally needed in most neural network applications. In such situations thechallenge is to eﬃciently extract the most relevant information from a small amount of availablesamples in a very simple generative network.Clearly, the term above referring to the accuracy of the modelling, is yet to be speciﬁed. In Section2.1 below, we recall a collection of properties that are widely accepted to be universal features(stylized facts) of time series of the form (2.1) and (2.4); and in Section 2.1.1 we brieﬂy recallcorresponding metrics that are commonly used to test for the presence of these styled facts. In fact, ∆ t can range from a few seconds to a month, generally including certain high frequency applicationsas well. From a mathematical perspective there is no reason for us to exclude shorter scales, but in this analysiswe focus on scarce data situations, hence we restrict one day to be the smallest unit. Similarly, application (i) in Section 1 could correspond to environments where M ≈ N are approximately thesame. This kind of situation would also arise when simulating tick data for high frequency trading. Also Scenarios(iii) and (iv) of Section 1 can occur with M ≈ N or M >> N . DATA-DRIVEN MARKET SIMULATOR FOR SMALL DATA ENVIRONMENTS 7

A reminder of speciﬁc stylised facts and evaluation metrics.

Time series data ofﬁnancial markets exhibits a set of stylised facts of ﬁnancial markets, that a realistic ﬁnancialmarket model is commonly expected to reﬂect. Below, we included a brief reminder to the mostcommon ones, for more details see [16].-

Non-stationarity:

Financial time series are typically non-stationary, that is, past returnsdo not necessarily behave like future returns. The stationarity assumption states that forany set of time instants t , . . . , t k and any time lag τ > r ( t , ∆ t ) . . . r ( t k , ∆ t )) ∼ ( r ( t + τ, ∆ t ) . . . , r ( t k + τ, ∆ t )) . (2.5) This property is not guaranteed to hold for the returns process in calendar time, see [16].- Heavy tails and aggregational Gaussianity:

Asset returns have (power-law-like)heavier tails than normal distribution, and have a distribution that is more peaked thanthe normal distribution. However, as the time scale ∆ t increases, the distribution looksmore and more Gaussian.- Absence of autocorrelations of asset returns, but slow decay of autocorrelationin absolute returns:

Asset returns are uncorrelated (except for very short intradaytimescales) but not independent. The autocorrelation (2.2) function of absolute returns | r ( t, ∆ t ) | decays slowly, as a function of the time lag τ following a power-law.- Volatility clustering and multifractal structure:

Phases of high/low activity tendto be followed by phases of high/low activity, see also [24, 50].-

Leverage eﬀect:

Asset returns exhibits a leverage eﬀect i.e. a negative correlationbetween the volatility of asset returns and the returns process.As mentioned above, the sequential nature of the data is usually modelled by an ad-hoc selectionof stylised facts and measured by corresponding evaluation metrics (see Section 2.1.1) in thecase of classical models. Due to the the lack of an established consensus for similarity metricsfor sample paths, to date this is also the case for generative models: The most straightforwardoptimisation routine is to target a number of essential stylized facts of the data in the trainingand in the evaluation of the generative procedure. Handcrafted combinations of stylised facts andcorresponding performance evaluation metrics can be included in the optimisation routine usedin a speciﬁc structure, alongside with further objectives with regards to a speciﬁc application.Therefore, in such cases the chosen combination of optimisation objectives is often very speciﬁcto the application at hand and does not transfer easily to other applications. This is referred toin [54] as the non-universality of the approach for ﬁnancial time series.2.1.1.

Performance evaluation metrics.

To date, the most commonly used evaluation scores in-clude (but are not limited to) the following:(1)

Distributional metrics:

Such metrics target the cumulative distribution functions, inorder to ensure that the distributions of the generated samples closely match the historicalones, often visualised by QQ-plots. To assess the goodness of ﬁt, the diﬀerence between thehistorical sample distribution versus the distribution of the generated samples is measuredwith respect to a suitable metric DD b ∈ B (cid:12)(cid:12) F ,n ( b ) − F ,m ( b ) (cid:12)(cid:12) , (2.6) where n, m denote the number of original and generated samples respectively, and B is (asuitable discretisation of) the sample space.(2) Tail behaviour scores:

Targeting properties of the underlying distribution that controlthe tail behaviour, higher order moments such as skewness and kurtosis.1 N x N x (cid:88) j =1 (cid:12)(cid:12) skew ( x j ) − skew ([ˆ x (1) j , . . . , ˆ x ( m ) j ]) (cid:12)(cid:12) ; 1 N x N x (cid:88) j =1 (cid:12)(cid:12) kurt ([ x j ) − kurt (ˆ x (1) j , . . . , x ( m ) j ]) (cid:12)(cid:12) (2.7)(3) Correlation-, and cross-dependence scores:

To detect serial autocorrelation in thetime-series, and analogously for multidimensional time-series, cross-correlation scores.

HANS B¨UHLER, BLANKA HORVATH, TERRY LYONS, IMANOL PEREZ ARRIBAS, AND BEN WOOD

The above examples already suggest a wealth of possible metrics and evaluation scores to measurethe quality of generated market paths and the list does not end here, depending on the applicationone has in mind: While the above evaluation scores check eﬀectively for some of the most relevantstylized facts, we might be interested in other properties of the generated data as well: We maywant to optimize for example with respect to the expected payoﬀs of vanilla options, or of aportfolio of options. Or we may want to optimize the generative process with respect to the P&Lof the hedged portfolio under an appropriate risk measure. Having deep hedging in mind as anapplication for the generative model, we may aim to include hedging objectives in the optimisationeither directly (which may be computationally expensive) or indirectly. For the latter we refer to[1] for some possibilities and we refer to Section 2.2.1 for a brief explanation on how our suggestedperformance evaluation metric (see (

Step 5 ) of Section 3) links back to hedging strategies.2.2.

Challenges for similarity metrics for ﬁnancial time-series by generative models.

Traditional distributional metrics and divergences provide ample means to measure distancesbetween distributions. But determining appropriate similarity metrics on the level of a stochasticprocess—or its data representation through a time series—is a challenge of a diﬀerent nature.This section is devoted to the questions: What are potential challenges in determining appropriatemetrics to measure the similarity of distributions on path space, or to determine whether two setsof sample paths originate from the same underlying distribution? Generalisations of the static case(similarity of marginal distributions) to the dynamic one (similarity metrics on path space) arenot straightforward. Similarly, evaluating the “goodness” of a generative model for sample pathsof ﬁnancial time-series is a challenge in particular. With view to our goal of proposing eﬀectiveperformance evaluation metrics for market generators we list a number of these challenges thatare addressed in this paper: (1)

The potential non-universality (as described in Section 2.1.1 above) of features to controlfor in the generated time-series. If the chosen set of optimisation objectives is bespoke to oneapplication, the generated time series may not carry over easily to other applications. (2)

The underlying distribution of the data generating model is often not known explicitly ingenerative models (See Section 1 above) and only controlled implicitly through the generated datasamples and their similarity to the original data. Therefore, two-sample tests may be better suitedas performance evaluation metrics than distributional metrics and divergences (KL-divergence,Fisher Information metric, Wasserstein metric). The latter may only be applicable after inferringfrom the available data sample to the underlying distribution. (3)

Established distributional metrics (and two-sample tests) for marginal distributions can begeneralised to (ﬁnite) multivariate marginals. However, generalising these metrics to path spaceis not straightforward, one of the diﬃculties being that the pathspace C ([0 , , R d ) is inﬁnitedimensional and non-locally compact, see [14]. (4) Non-continuous observation of the original data. Usually only discrete time observations ofsample paths are available. While for classical models this is less problematic, for generative modelsan appropriate feature set needs to be speciﬁed. Modelling sample paths as a learning problem ofvector-valued data is problematic for several reasons: the number n and position of observationtime points might change from sample to sample. Also, in some applications (for example high-frequency data) n can get very large. Finally, from a hedging perspective it is delicate (and oftennot suﬃcient) to match marginal distributions on a ﬁnite number of observation dates only. Asshown in [9] one can construct examples that are indistinguishable from one another on a ﬁniteset of marginals in the physical measure, but lead to arbitrarily diﬀerent hedging strategies andoption prices. (5) Non-stationarity of the target distribution. Financial time-series are typically non-stationaryand underlying distributions change with market conditions. For generative models this has twoimplications in particular: It is beneﬁcial to design a conditional version of the market generator,which allows to produce samples that are conditioned on speciﬁc market states, see Section 3 formore details. Furthermore, training a conditional generative model may amplify the data scarcity

DATA-DRIVEN MARKET SIMULATOR FOR SMALL DATA ENVIRONMENTS 9 issue as the availability of suﬃcient number of representative training samples becomes coupledwith market conditions.2.2.1.

Two-sample tests and Maximum Mean Discrepancy metrics as performance evaluation met-rics.

If the distributions of the original data generating process ( r i ( t , ∆ t ) . . . r i ( t k , ∆ t )) and thegenerated series ( (cid:102) r i ( t , ∆ t ) . . . (cid:102) r i ( t k , ∆ t )) are known explicitly, a number of established similar-ity metrics can be computed eﬃciently. In practice however, this underlying distribution of theoriginal data is (often) only available through the given samples and not known explicitly. Forgenerative models, the same holds for synthetic samples as well as summarised in (2) above. If atleast one of these data samples is small, metrics on the level samples rather than on the level ofdistributions are preferable. Two-sample tests provide a ﬂexible framework to compare the empirical distributions of two givendata samples according to the following principle: If X and Y are random variables with respec-tive probability measures p and q deﬁned on some common state space; given i.i.d. observations { x . . . , x m } and { y , . . . , y n } from p and q respectively, can we decide whether p (cid:54) = q ?Typically these tests are designed to determine whether two given data samples were generated bya common underlying distribution or not, but do not specify what that common distribution is .A number of related tests addressing matter (2) , have recently become more popular in generativemodelling, see [25]: In [25], the authors determine if two samples are drawn from diﬀerent distri-butions p and q based on the largest diﬀerence in expectations over predeﬁned set of functions inthe unit ball of a Reproducing Kernel Hilbert Space (RKHS) H .MMD[ H , p, q ] := sup f ∈H ( E X ∼ p [ f ( X )] − E Y ∼ q [ f ( Y )]) . Where the space H is rich enough to distinguish between metrics p and q . Therefore the metric iscalled the maximum mean discrepancy (MMD). Though this does not yet address the matter (1) ofnon-universality, using these metrics in the training phase have rendered more stable convergenceresults in several studies and they can be computed eﬃciently. Therefore, MMD quickly enteredthe neural network arena in generative modelling.This is the performance evaluation metric we use for our generative algorithm, see (3.1) in Section3.2.5. Generalising Maximum Mean Discrepancy metrics to path space faces the same challenges asthe ones summarised in (3) . In a recent paper [14], Chevyrev and Oberhauser develop a MaximumMean Discrepancy based two-sample that relies on a feature map from stochastic analysis calledthe “ signature ” of a path, which tackles point (3) simultaneously with the issue (1) of non-universality. In fact, this test is based on a notion that is reminiscent of the role of momentgenerating functions on path space and hence characterises the distribution of the stochasticprocess uniquely, see Appendix A for details. Furthermore, it follows from [2, 46], that in additionto the advantages above, signatures also provide the right framework to match hedging objectivesand hence to bypass matter (4) .In fact, the insight that the similarity on the level of signatures (hence passing the signature basedMMD test described in Section 3.2.5) can be linked back to similarity in hedging performanceis a consequence of the continuity propertiy of the Itˆo-Lyons map [46]. This states that themapping from the driving path controlled diﬀerential equation to its solution is continuous (in factLipschitz) in a suitable rough path norm. As a consequence, if the signatures of two driving pathsare close , then so are the solutions of the corresponding controlled diﬀerential equations: Sincethe performance of a hedging strategy can be written in form of a controlled diﬀerential equation(the performance is an Itˆo integral of the hedging strategy against the price process) then if twoprice paths have similar signatures (i.e. pass the signature based MMD test described in Section3.2.5), it follows that they will have similar performance under the same hedging strategy. One of the best known such nonparametric two-sample tests is the popular two-sample Kolmogorov-Smirnovtest, but it has some shortcomings: It may need a large number of samples and it is diﬃcult to generalise to higherdimensions. In [58] MMD has been established in the context of Wasserstein Auto-Encoders and [59] introduce MMD-basedGANs. By making fundamental connections to optimal transport distances in the data space, MMD measureses-tablish the theory proving the correctness of this generative procedure.

On signatures, their advantages in encoding path valued data, and their meaning.

The challenges that generative modelling of ﬁnancial data streams faces—raised in the beginningof Section 2—are not limited to the choice of appropriate performance evaluation metrics only.They also extend to training: An eﬃcient parsimonious encoding (feature map) of ﬁnancial datastreams also enhances the training of such generative models. Furthermore, an encoding thatanticipates typical properties of the data and irregularities of sampling, makes training more robustwith respect to data quality. The framework of rough paths and signatures [23, 29, 47, 49, 54]lends itself well to these aims as well: They provide a means to encode ﬁnancial data streamsparsimoniously and eﬃciently and they a powerful framework to address further challenges ofpath valued data as well. Therefore, we do not only use the signature framework for performanceevaluation but also resort to these as a feature map in the generative model itself.

Deﬁnition 2.1 (Signature of a path) . Let X : [0 , T ] → R d be a continuous path of boundedvariation. Then the signature of X is deﬁned by the sequence of iterated integrals given by X < ∞ T := (1 , X t , . . . , X nT , . . . )(2.8)where X nT := (cid:90)

If the path X has bounded variation – which is the case of discrete data – theintegrals above can be deﬁned using Riemann-Stieltjes integrals.As mentioned in the beginning of Section 2, a particular challenge in the context of synthetic gen-eration of market paths, is that the distribution in question is deﬁned on the inﬁnite-dimensionalspace of paths C ([0 , , R d ), while the available generative modelling tools are ﬁnite-dimensional.Thus the inﬁnite-dimensionality of path space is not only a challenge for the choice of suitablesimilarity metrics or performance evaluation metrics but also for feature extraction of the dataand training. A solution to this issue is to project this inﬁnite-dimensional space to a suitableﬁnite-dimensional space where standard methods for generative models may be used.However, mapping this inherently inﬁnite dimensional space in an optimal way to a ﬁnite dimen-sional one presents a challenge and the choice of projection is not trivial:The most straightforward choice would be to sample the path on a ﬁxed, discrete time grid re-turn by return as in (2.1) for example (while accounting for the joint distribution of these), andlearn the projected probability measure using standard generative models. This approach wouldnot fully capture the sequential nature of ﬁnancial data and would fail to eﬀectively capture theprobability measure on the original path space. Moreover, if we project this inﬁnite-dimensionalobject down to a ﬁnite-dimensional space by sampling on a discrete time grid, the projectionis not a “natural one” as the original distribution on an intrinsically inﬁnite-dimensional spacecaptures much richer information about the process. The latter can have signiﬁcant consequenceson hedging as outlined in [9] and in the end of Section 2.2.1 above.A more eﬀective projection is to use the signature or log-signature to project an inﬁnite dimen-sional encoding (2.8) of the path space to a ﬁnite N -dimensional one as in (2.10). The signature ofa path is a transformation of the original continuous path into a sequence of statistics, an inﬁnitedimensional vector (2.8) of signature entries (2.9). These statistics fully characterise the originalpath up to time parametrisation (see [30, 6] and Appendix A for details) and furthermore theyoﬀer a faithful and parsimonious description of it already in its ﬁrst few entries (dimensions) ofthe signature vector (2.10). The error made by the truncation at level N decays with factorialspeed as O (1 /N !), see [49]. See Deﬁnition A.3 in Appendix A for the latter.

DATA-DRIVEN MARKET SIMULATOR FOR SMALL DATA ENVIRONMENTS 11

Moreover, the ﬁrst several signature entries—i.e. the ﬁrst terms in the vectors (2.9) resp. (2.10)—have clear ﬁnancial interpretations: The ﬁrst term captures drift – i.e. the increment of a pricepath over a period of time. The second term indicates the volatility over the period of time(through the L`evy area). Higher order terms capture ﬁner aspects of the path that end up fullycharacterising the latter. An ordering, reminiscent of principal components from the most relevanttowards ﬁner properties of the path. See the Appendix A for more details.2.3.1.

On returns-based versus signature-based data generation in a pricing and hedging context.

Itclear from the previous section that the signature transform is a highly eﬃcient way of encoding themost relevant information contained in a stochastic path. We demonstrate in our numerical resultsbelow that learning the (truncated) signature of a set of paths leads to a more eﬃcient learning(i.e. training converges with fewer training samples already) than learning the multidimensionaldistribution of the process on a discrete grid. The projection on the ﬁnite dimensional space oftruncated signatures is not only numerically more eﬃcient than the ﬁnite dimensional projectionon a discrete time-grid. The signature-based projection encodes a richer and more relevant wealthof information about the path (which also allows us to control for option prices and hedgingstrategies [2, 55]), while in the latter projection some essential information may be lost, which canhave ﬁnancial consequences on option prices and hedging strategies: In fact [9] shows that whensampling returns distributions of a stochastic process on a discrete time grid, even statisticallyindistinguishable sets of paths in the historical measure can lead to arbitrarily diﬀerent optionprices. If however the paths are sampled on the level of signatures, this ambiguity of optionprices does not occur. The ﬁndings of [2, 9] therefore indicate that not only is the signaturea more eﬃcient way of encoding sample paths but one that that removes the ambiguity of thecorresponding option prices and provides a meaningful control over hedging performance. See theend of Section 2.2.1 above for the last statement.2.3.2.

Further advantages of signatures.

Further advantages of working with signatures for mod-elling functions of data streams have been discussed and presented in [47, 54]. These advantagesinclude (but are not limited to) the following properties: The expected signature of a stochasticprocess determines the law of the process uniquely . With that, the expected signature playsa similar role on path space as the moment generating function for distributions (this provides abasis of the performance evaluation metric developed in [14] for path space).

They permit a model-free data-driven modelling:

The framework does not impose anyassumptions on the underlying stochstic dynamics. Signatures provide a ﬂexible basis of functionsfor a functional on path space. But while Fourier transforms and wavelets have a similar roleapproximating curves as a linear combination of basis functions, signatures do so in a model free,unparametrised way (since a path by path characterisation is possible).

The signature transform is traightforward to implement:

Today there are readily available(and constantly improving) powerful python packages and libraries to transform data streams tosignatures and algorithms to transform signatures back to paths of datastreams. For the inversetransform, one possible algorithm is developed in this paper (see Section 3.1) and provided in theGithub repository Github:Marketsimulator.The framwork is invariant under translation and time-parametrisation . Therefore, inorder to encode price paths in business time rather than calendar time we apply the lead-lagtransformation (see Section A.2 for more details). Such as esig, tosig and iisignature libraries, see https://github.com/bottler/iisignature as well as [41] formore background. Signatures are constructed from increments of the path. As such, they are also invariant under translation: allreference to absolute values of the path is lost. One may ﬁnd settings where it is important to make reference toabsolute values of the path; e.g. to ensure that the asset price remain positive. In such cases, the so-called visibilitytransform can be used. One of the commonly used method for measuring similarity between two temporal sequences, which may varyin speed is

Dynamic time warping (DTW). A well known application has been automatic speech recognition, tocope with diﬀerent speaking speeds.

In ﬁnance a similar consideration suggests measuring business time ratherthan calendar time of the process.

Furthermore, signatures are robust to irregular sampling (which becomes relevant for tick-data), missing data and towards highly oscillatory data as well : In particular, they providea consistent framework for unbounded variation paths, which may arise by Donsker-type theoremsin the high-frequency limit. Functions of such paths have to be treated with care, as for examplethe quadratic variation, or the solution of nonlinear ﬁltering or of stochastic diﬀerential equations,do not depend continuously on the underlying path. Signatures appear naturally when describingthe behaviour of functions of non-smooth paths, cf. [14].2.3.3.

From signatures to log-signatures.

In the present work, our generative model we are not tar-geting the signature directly, instead we ﬁrst use a bijection to log-signatures (See Deﬁnition A.3 inAppendix A for the latter). Generative modelling on signature space directly may be problematic,which results from the fact the signature space is not linear: Therefore, small perturbations ofthe signature (as one might expect to obtain from the output of a generative model on signaturespace) of a path will in general not correspond to the signature of some other path. In fact theremay not exist any path with a signature that results from the perturbation. We solve this issueby working with the so-called log-signature instead, which also characterises the price path, butnow spans a linear space. We refer the reader to [13, Section 1.3.5] for a detailed discussion of thelog-signature, as a full reminder of its formal characterisation and properties is out of the scope ofthis paper. For more information on the background and implementation of signatures and log-signatures, available python packages and libraries for signatures, log-signatures and the lead-lagtransformation see the related works [13, 41, 53, 56] such as a brief reminder in the Appendix Abelow. Recent machine learning applications using signature inputs include [8, 12, 27, 47, 53, 54].3.

Main results: Our methodology and its background and motivation

Problem setting and main results and contributions:

Given a ﬁnancial data stream (be it a market index S&P, DAX or a numerically generated series) we work with the one availablerealisation of the evolution of this stream. From the observed path observed on a time horizon ofseveral years we intend to infer the underlying data generating process in order to produce furthersamples from that distribution . We train a variational autoencoder (VAE) to reproduce theunderlying distribution of an observed spot index on diﬀerent time horizons up to a month. Notethat our methodology is by no means limited to one-month time horizons and in the postprocessingstep (Step 4) of our methodology we propose a way to generate data on longer time horizons (hereup to several years) by appropriately concatenating paths.(a) As a ﬁrst step we demonstrate that a returns generation in the classical sense is possibleby variational autoencoders on various time-scales (daily, weekly, monthly returns). Thisexperiment follows the spirit of other currently available market generation approaches,and supports our choice of VAE as a parsimonious generative model.We then also go beyond simple returns generation in the following ways:(b) To accommodate to inherent inﬁnite-dimensional nature of the problem, we propose agenerative process directly on path space, based on the signatures (see Section 2.3 above)of the paths. We then demonstrate that the generated paths do not only ﬁt all theobserved marginal returns distributions (obtained in the approach above) but also thejoint distributions of returns on these time-scales and other essential features of the data.(c) We ﬁne-tune the generative process by allowing conditioning on various market indicatorsto account for the non-stationarity of the time series. With this conditioning we reﬁneour (returnes-based or signature-based) variational autoencoder (VAE) to a conditionalvariational autoencoder (CVAE), and generate paths conditional on various market states.The latter conditioning also allows us to paste path segments conditioned on the signatureof the previous path segment, and thereby generating paths of arbitrary length. We assume the data stream to be univariate for notational simplicity but it is straightforward to extend themethodology can be extended to multivariate data streams too. For this we temporarily make the customary stationarity assumptions which we will later relax.

DATA-DRIVEN MARKET SIMULATOR FOR SMALL DATA ENVIRONMENTS 13

Methodology: An overview of the main steps.

Our algorithm and numerical experi-ments can be subdivided into the following ﬁve main steps. We ﬁrst give a brief overview of theseand provide in Section 3.2 more explanations on each of the steps.(Step 1)

Data extraction from time series:

In our experiments given a sample path from adata stream, we subdivide the the full time series in equal length intervals . Here wesubdivide into segments of: (i) 1 day, (ii) 5 days, corresponding to a business week, and(iii) 20 days corresponding to a month.(Step 2) Preprocessing the data:

To obtain training data from the resulting path segments, we(a) calculate log-returns r ( t, ∆ t ) = X ( t + ∆ t ) − X ( t ) with appropriate ∆ t , to generate(i/a) daily log-returns, that is ∆ t = 1 day,(ii/a) weekly log returns, that is ∆ t = 5 days and(iii/a) monthly log returns, that is ∆ t = 20 days, from the data.(b) convert the obtained data samples into log signatures (for the paths of length (b/ii)5 days and (b/iii) 20 days) applying the lead-lag transformation. This will enablethe pathwise generation process described in point (b) above. A mathematical back-ground and motivation for these transformations is given in Appendix A.(Step 3) Creating and training the VAE network:

After splitting the historical data intotraining/testing/and validation sets(a) we train a variational autoencoder

VAE(a) on the daily, weekly and monthly returnsfor (i/a), (ii/a), (iii/a). The output of this VAE is on the level of returns.(b) We train a variational autoencoder

VAE(b) on the log-signatures of weekly andmonthly sample paths for (ii/b), (iii/b) (see Sections A and A.1). The output of thisgenerative VAE(b) is then given in form of log-signatures.(c) In the reﬁned version we also calculate and store relevant market conditions suchas current level of volatility, current level of the index, signature of the previouspath segment. These values will then be used in the CVAE (Conditional VariationalAutoencoder) to generate new data points conditional on these indicators.(Step 4) Postprocessing of the outputs of the VAEs:

At this stage of the generative pro-cess we either convert back the generated log signatures into paths or use the generated(log-) signatures directly. In fact, in case the purpose of the generative process to providetraining data for neural networks with pricing and hedging objectives, both options areavailable: Either: (4.b) Signatures can be used for option pricing directly as suggestedin [55], Or: (4.a) In order to invert signatures into paths, we suggest in Appendix A onepossible method do so . In fact, inverting signatures into paths is also instrumental toallow comparing the performance of the generative networks on the level of returns distri-butions, as described in the second table of the following point.(Step 5) Performance evaluation:

We evaluate the performance of each of the above approachesto the generative process and compare the outputs (both to the original data samples andto one-another) with respect to diﬀerent similarity metrics and conclude that the signature-based generation outperforms the returns-based approach. In fact in case the purpose oftime-series generation lies in the context of pricing and hedging derivatives, the suitablesimilarity metrics are indeed the signature-based similarity metrics (see Sections 3.2.5 and2.2.1 for details). This follows directly from ﬁndings of [9]. For the numerically generated data in our experiments, we directly generate sample paths of the above length(i), (ii) and (iii), but we could have applied the same method as above. That is, the instantaneous level at the start each path segment. Note that developing the computationally most eﬃcient ways of this task are mathematically highly nontrivial,and subject of ongoing research [18].

Background and explanations to our generative modelling methodology. (Step 1) Data extraction from time series stream.

It directly follows from the non-stationarity of ﬁnancial time series that ‘past returns do not reﬂect future performance’[16]. Therefore, strictly speaking, the data of the evolution of an index over time amountsto one single observation of a realised path. However as a starting point to any per-form statistical analysis of market data one needs several observations of the underlyingquantity to which one of the most basic requirements is ‘the existence of some statisticalproperties of the data under study which remain stable over time’ [16]. To extract multipleobservations from this data stream, one divides this path into segments. This (as usual)calls for some sort of stationarity assumption, though ﬁnancial time-series are known to betypically non-stationary. In our experiments given a sample path from a data stream, wesubdivide the the full time series in equal length intervals: (i) 1 day, (ii) 5 days, and (iii)20 days. In dividing the observed path into segments, there are diﬀerent considerationsto balance with one-another: • The longer the path segments, the fewer path segments are available . • The longer the path segments the less severe the violation non-stationarity assumptionin the obtained training data.3.2.2. (Step 2) Returns-based (a) versus signature-based (b) data generation:

If one wantsto create a generative model that learns to generate samples from a distribution on pathspace, one has to decide what representation of the path will be used: a returns-based(a) or signature-based (b) projection from path space to a ﬁnite dimensional space. Thisrepresentation has to be an eﬀective representation of the path, and it should be richenough to capture the distribution of the paths. See Section 2.3 for a discussion of com-pelling reasons to use the truncated signature projection for this purpose. Given thatthe signature of a path uniquely determines the path [7] and that the expected signatureuniquely determines the distribution of the paths [15], it is natural to use the signatureto represent the path. When the truncated signature is used, the representation of thepath is essentially a vector on some high-dimensional space. This transformation of pathsto signatures can then be applied to our sample of paths, in order to then use traditionalgenerative models such as Variational Autoencoders. However, the signature of a path isa group-like element [49, Deﬁnition 2.18] whereas the generated signatures wont. There-fore, it is more convenient to apply the generative model to log-signatures, because oncewe compute the tensor-algebra exponential of the generated log-signatures the resultingelement will be group-like [49, Theorem 2.23]. In this case, the output of the generativemodel will be in form of log signatures as well and may have to be inverted to paths in apost-processing step (see

Step 4 ). In fact, the generation of paths in (log-)signature spaceis more eﬃcient than maximixing a corresponding ﬁt in returns for a simulated stochasticprocess in the sense that the generative process achieves a higher precision already withless training data. This can be seen in comparative performance metrics (Step 5) in thenumerical experiments: There, a comparison with purely returns optimised VAE showsthe proof of concept that signatures work better. See Section 4.1.1 for the correspondingnumerical results. More precisely, the comparison of the weekly 5-dimensional joint dis-tribution with the weekly signature paths we show that the signature-based generationoutperform the weekly joint distribution-of-returns based gneration. The reason for thisis that signatures are an eﬃcient feature map to encode the most relevant informationcaptured in time series data. A 20 year daily observation timeline results in ∼

250 sample paths with monthly (20 days) path segments,or ∼ ∼ DATA-DRIVEN MARKET SIMULATOR FOR SMALL DATA ENVIRONMENTS 15

In a pricing and hedging context there is an even more compelling reason to use sig-natures as a feature map for the time series: Not only does a signature-based generationyield a faster training and more stable convergence, but in a pricing and hedging contextit eliminates pricing ambiguities while pure returns-based generation does not as Brigo ex-plains in [9] in a framework consistent with statistical analysis of historical volatility thatcan lead to arbitrarily diﬀerent options prices. Previously, [2, 9] had shown that impliedvolatility is linked with a purely pathwise lift of the stock dynamics, conﬁrming the ideathat while historical volatility is a statistical quantity, implied volatility is a pathwise one.See also the end of Section 2.2.1 for a hedging perspective.3.2.3. (Step 3) Network Architecture and Training of the VAE and CVAE.

The Achitec-ture of the VAE and CVAE networks such as the market indicators that serve as condi-tioning variables for the CVAE are summarized in this section below. For full details of themodel and training we refer the reader to the Github repository Github:Marketsimulator.

The encoder:

The encoder network has one hidden layer and two latent layers, with 50nodes on the hidden layer. The activation function is a leaky (parametric) ReLU withparameter α = 0 . The decoder:

The decoder network has ne hidden layer with 50 units and activationfunction leaky (parametric) ReLU with parameter α = 0 .

3. The output layer with sig-moid activation function.

Conditional Variational Autoencoder, adapting to speciﬁc market conditions.

In order tofurther accommodate to the non-stationarity of the data, we now reﬁne the VAE to certainspeciﬁc market conditions: a Conditional Variational Autoencoder (CVAE). The marketconditions we consider here are the following:(a)

Level of the instantaeons volatility at the start of the path. (b)

Level of the index at the beggining of the path. (c)

Log-signature of the previous path:

This last condition is in fact more reﬁnedthan the ﬁrst condition, in fact it is more restrictive. If we control for the log-signatureof the previous path, we automatically control for the volatility. See Section 2.3 formore details on the ﬁnancial interpretation of the elements of the signature vectors.

Motivation for VAE as our generative model.

Here, we brieﬂy motivate our choice forVAEs over the other generative modelling approaches: VAEs aim at maximizing the lowerbound of the log-likelihood of the observed data, see Appendix B for details. With thatthey are parsimonious, theoretically clear to explain and also easy to implement and tointerpret. In summary, VAEs are stable and ﬂexible generators for scarce data environ-ments. Furthermore, VAE frameworks work consistently well under diﬀerent diﬀerentialoperators and architectures including recurrent networks, [26, Chapter 20]. This ﬂexibilitymay prove useful in later applications.Arguably, the most popular diﬀerential generator networks today are GANs. In fact,the performance evaluation metric presented in (

Step 5 ) can be seen as a non-automateddiscriminator applied as a one-step veriﬁcation that the generated samples are indistin-guishable from the original ones.Our choice for the VAE approach in the current context is based on the following consid-erations. • The relative unpopularity of VAEs stems from image processing and its relative weak-ness in that context are irrelevant for time-series applications. In fact, the relativeunpopularity of VAEs compared to GANs in image processing originated from the factthat generated samples from VAEs trained on images tend to be somewhat blurry.This could be attributed to the fact that the model (maximising the likelihood ofthe observed dataset) may attribute a high probability to (nearby) points other thanthe ones in the training set. This however would be no drawback of the VAE in atime-series setting. • VAEs require considerably less data for training than GANs. A GAN aims at achiev-ing an equilibrium between a

Generator and

Discriminator . Achieving the equilib-rium between generator and discriminator networks of GANs is in most practicalscenarios a highly delicate matter and often sensitive to hyper-parameter tuning (cf[26, p. 692]). Recently much research activity was devoted to addressing this issue,and reformulations of GANs have been proposed that are guaranteed to converge, ifprovided with enough training samples. Since in this application we are speciﬁcallytargeting scenarios where training data is scarce, we opt for generative variant of Au-toencoders (the nonlinear generalisation of PCA, see [26]) for generative modellingin our current context.3.2.4. (Step 4) Postprocessing of the VAE outputs.

When the Market Generator is aVAE(b)-type of generator, that is the outputs of the generative process are given in formof log-signatures or conditional log-signatures (conditioned on the log-signature of theprevious path segment), then:(a)

We can leave the output of the (C)VAE in the form (log-)signatures andapply it directly to pricing:

Signatures can be used directly for pricing vanillaproducts or exotic derivatives as proposed in [55] and this approach has several advan-tages in particular for computation heavy, path-dependent applications. Therefore,if we have these algorithms in place, for many applications (including XVA orientedcomputations) it is advantageous to leave the output of our generative model on thelog-signature level and apply directly pricing algorithms for signatures to that.(b)

Alternatively, we can convert back the the output of the VAE(b) fromsignatures to sample paths:

In this paper we develop an evolutionary algorithmfor this purpose. Other alternatives are currently a topic of active research [18].The idea behind inversion of signatures to paths is based on the following idea: Thesignature of a path uniquely determines the path itself [30, 6] – in other words, ifwe know the signature of a path, we know the path itself. However, the task ofretrieving the path from its signature (or log-signature)in a computationally eﬃcientway is a highly non-trivial task [18]. In this paper, given that stock prices discrete(multiples of the pip size), we used an evolutionary algorithm to retrieve a pathwhose log-signature is close to the generated one. Evolutionary algorithms aim tosolve certain optimisation problems by mimicking to some extent biological evolution.In the context of signature inversion, we start with an initial population of randompaths, and we iteratively (i) select the paths whose signatures are closest to the targetsignature, and (ii) breed these paths and introduce mutations to generate a newgeneration of paths. After suﬃciently many iterations, we end up with a populationof paths whose signature are close to the target signature we aim to invert.(c)

Concatenation of paths for longer time-horizons:

Recall that in VAE(b/ii) theoutputs of the generative process are log-signatures of weekly paths and in VAE(b/iii)the outputs are log-signatures of monthly paths. Recall also, that the longer the paths,the less samples we have available. However, in some scenarios we might be interestedin obtaining longer sample paths than the outputs of the generative model: Say, gen-erating signatures of monthly (or even longer) paths from the outputs of VAE(b/ii)the weekly signature-based generative model. This can be done, using the multiplica-tive algebra structure of the signature space. We demonstrate this by assemblingmonthly signature outputs VAE(b/iii) to yearly paths in our numerical experiments,see Figure 1 below.

Converting from returns (on diﬀerent time scales) to paths : If the Market Gen-erator is a VAE(a)-type of generator, that is, the output of the VAE is in form of returns,one may be interested in producing sample paths from these (i) daily,(or the (ii) weekly or(iii) monthly returns) generated returns. We do so in our numerical experiments in order

DATA-DRIVEN MARKET SIMULATOR FOR SMALL DATA ENVIRONMENTS 17

Figure 1.

Here, we demonstrate that the method can be applied to generatepaths on longer horizons than one month (here up to one year) by concatenatinggenerated paths with the CVAE where we consecutively condition the output ofthe monthly log-signature CVAE generation on the log signature of the previoussegments (generated) path’s signature. See item (c) of (Step 4) in Section 3.2.4.to compare the performance of the signature-based generation VAE(b) with the perfor-mance of returns-based generation VAE(a), see Step 5 below, in particular the Section

Performance evaluatio on the level of signatures . In order to construct path from thegenerated VAE(a/i) returns, we take a Monte-Carlo-type approach by assembling dailygenerated increments to a full sample path. (Step 5) Evaluating the generated paths.

In this step we apply performance evaluationmetrics to the output of the generator networks to evaluate certain relevant characteristicsof the generated distribution whether they reﬂect the true distribution of the original data.These evaluation metrics are diﬀerent from the objective function of the generative processitself. These ﬁnal performance evaluation tests resemble the role of the Discriminatornetwork in a GAN, with the diﬀerence that in our algorithm, performance evaluation ismanual and only happens once.3.2.5.

Distances for time-series and sample paths: A computationally eﬃcient MMD met-ric for laws of stochastic processes.

When it comes to assessing the quality of a set ofgenerated paths, being able to compute the distance between the laws of two stochasticprocesses becomes imperative. A naive metric based on the marginals of the two processesis bound to fail, as two stochastic processes can have identical marginals but very diﬀerentlaws. Instead, a metric that considers the entire law of the stochastic process is needed.Moreover, this metric has to be computable in order to make it practical. In [14], theauthors propose a (computationally eﬃcient) MMD for laws of stochastic processes basedon signature kernels. As an application, they use this metric to develop a two-sampletest for stochastic processes. In the context of this paper, this statistical test can be usedto evaluate the quality of our market generator. To asses whether a generative model isable to generate paths that are realistic with respect to a sample of real paths Y , . . . , Y n ,we sample from the generative model n ∈ N , paths X , . . . , X n and we apply the two-sample test proposed in [14]. More speciﬁcally, we compute the signature-based MMDtest statistic T ( X , . . . , X n ; Y , . . . , Y n ) T ( X , . . . , X n ; Y , . . . , Y n ) :=1 n ( n − (cid:88) i,j ; i (cid:54) = j k ( X i , X j ) − n (cid:88) i,j k ( X i , Y j ) + 1 n ( n − (cid:88) i,j ; i (cid:54) = j k ( Y i , Y j ) , (3.1) where k ( · , · ) is the so-called signature kernel (see [14, Proposition 4.2]). Then, given aﬁxed conﬁdence level α ∈ (0 , c α := 4 (cid:112) − n − log α . Thegenerative model will be said to be realistic with a conﬁdence α if T U < c α . Performance evaluation on the level of signatures:Gen. data ↓ / Real data → Weekly paths (b/ii) Monthly paths (b/iii)Returns (a/i,ii,iii)

Ret (a/i) ⇒ paths ⇒ (b/ii) Ret (a/i) ⇒ paths ⇒ (b/iii) Log-Signatures (b/ii)

Direct (b/ii) × ⇒ Direct on (b/iii)

Log-Signatures (b/iii) (b/ii) × ⇒ Direct on (b/iii) Direct

Comments:

The vertical column denotes the generated data and the horizontal rowthe original data. In particular Signatures (b/ii) dentotes that the output of the generativemodel was on the level of weekly signatures. Therefore, to compare (b/ii) weekly generatedsignatures with (b/iii) signatures monthly paths, one concatenates four weekly signatures(see: product of signatures in Appendix A) and compares on the monthly signature level.Clearly, if we generated data on the level of returns, we can compare this with the returnsdistribution of the original data direcly (see table below). But for a comparison on thelevel of signatures, as a ﬁrst step one builds random paths, by sampling from the gener-ated returns for each new increment, then calculates the corresponding signature of thethus generated paths. This process is encoded in the notation Ret (a/i) ⇒ paths ⇒ (b/ii). Performance evaluation on the level of returns / marginal distributions:Gen. data ↓ / Real data → Daily data (a/i) Weekly paths (a/ii) Monthly paths (a/iii)Returns (a/i,ii,iii)

Direct with (a/i) Direct with (a/ii) Direct with (a/iii)

Inverted Sig. (b/ii) (b/ii) ⇒ paths ⇒ (a/i) (b/iii) ⇒ paths ⇒ (a/ii) (b/ii) ⇒ paths × ⇒ (a/iii) Inverted Sig. (b/iii) (b/iii) ⇒ paths ⇒ (a/i) (b/iii) ⇒ paths ⇒ (a/ii) (b/iii) ⇒ paths ⇒ (a/iii) Comments:

If both the generated data is in form of returns (a/i,ii,iii), then comparisonsof the empirical marginal distributions are direct. If, however, the data generation was onthe level of signatures (b/ii,iii) then one ﬁrst has to invert the signatures back to paths fora comparison. This is encoded in the entry (b/iii) ⇒ paths ⇒ (a/i). Note that process ofinverting signatures can be slow (the longer the paths the slower) and eﬃcient algorithmsfor that are subject of onoing research, see for example [18].4. Numerical Results

Numerical experiments with historical data of S&P.

To demonstrate the accuracygenerated log-signatures and images of the produced paths, we provide in this section images of2D projections and the resulting generated paths and corresponding returns for optical demon-stration. The rigorous numerical demonstration of the accuracy of the paths via the MaximumMean Discrepancy inspired signature moments method can be found in Section 4.1.1 below.

DATA-DRIVEN MARKET SIMULATOR FOR SMALL DATA ENVIRONMENTS 19

Figure 2.

The (unconditional) VAE:

The image shows projections of gener-ated weekly signatures (b/ii). Since the log-signatures we generate in this pro-cedure are high-dimensional objects, we display here their projections on varioustwo-dimensonal subspaces, indicated on the vertical axis.

Figure 3.

The (unconditional) VAE:

The image shows projections of gener-ated monthly signatures (b/iii) on various two-dimensonal subspaces.

Figure 4.

This is the image of generated paths inverted from log-signatures.

Figure 5.

The conditional VAE:

The image shows projections of weekly sig-natures generated by the CVAE, projected on various 2dim subspaces. Here wecondition on an instantaneous volatility of 5% of the process measured a the startof the interval.

Figure 6.

The conditional VAE:

The image shows projections of weekly sig-natures generated by the CVAE, projected on various 2dim subspaces. Here wecondition on the current level of the process a the start of the interval. Morespeciﬁcally, we condition on the spot price being S (0) = 2000.4.1.1. Performance evaluation scores.

We conduct the signature-based MMD two-sample test(3.1) described in Section 3.2.5 for the scenarios described above. In the table below, we in-clude 1.) the conﬁdence level at which the the signature based MMD test changes from the result

DATA-DRIVEN MARKET SIMULATOR FOR SMALL DATA ENVIRONMENTS 21 “ the two samples come from the same distribution ” to the result ” the two samples come from dif-ferent distributions ”. Clearly, if the the test can be passed on a higher conﬁdence level, it indicatesa higher similarity of the generated samples to the original ones.As a comparison we also performed a classical two sample test on the level of marginals, summa-rized in the table below: We include 2.) the corresponding conﬁdence level for the statistic for theKolmogorov-Smirnov test applied to the marginal distribution at ﬁnal time of the time horizon.We display the results for weekly and the monthly paths of the unconditional (uc) variationalautoencoder, as well as one example for the conditional variational autoencoder, where we condi-tion the samples of the instantaneous volatility being close to the 5% level.MMD signature conﬁdence level K-S test p -valueWeekly signature paths (uc) 99 . . . At 99.95% conf. level

MMD signature test K-S d1 K-S d2 K-S d3 K-S d4 K-S d5VAE signatures Passes Passes Passes Passes Passes PassesVAE multidim. distr.

Fails

Passes Passes

Fails Fails

PassesWe observe that the the latter greatly outperforms the former in terms of quality of generatedpaths with the same amount of training for both methods.4.2.

Numerical experiments with synthetic paths from rough volatility models.

Finallywe demonstrate our experiments also on a fractional stochastic volatility model, the rough Bergomimodel introduced in 2015 by Bayer, Friz and Gatheral [4], which is a natural extension of the roughfractional stochastic volatility (RSFV) model in [24] to the setting of pricing. Rough stochasticvolatility models have the advantage that they capture well some of the essential stylized facts ofﬁnancial time-series. dX t = − V t dt + √ V t dW t X = log( S ) V t = ξ E (2 νC H V t ) , V , v, ξ > V t = (cid:82) t ( t − u ) H − / dZ u , H ∈ (0 , / (cid:104) Z, W (cid:105) t = ρt, ρ ∈ ( − , , , where X := log( S ), and E ( · ) denotes the stochastic exponential and C H := H Γ(3 / − H )Γ( H +1 / − H ) .We include this experiment in order to demonstrate that our method already works optimally onthe small number of paths available from S&P 500 data. We demonstrate this by numericallygenerating a simulated training dataset from the rough Bergomi model with model parametersinspired by the calibration results in [35]. In the ﬁrst experiment generate the same number ofpaths in the rough Bergomi model as the number of paths available from the S&P training data(small dataset) and another training dataset with a signiﬁcantly larger number of simulated paths(large dataset).We then train the Variational Autoencoder on the log-signatures of the simulatedpaths ﬁrst on the small dataset and then on the large dataset and assess the quality of the Figure 7.

Left had side: Projections of signatures generated by the VAE trainedon 250 (small dataset) monthly rough Bergomi paths. Right had side: Projectionsof signatures generated by the VAE trained on 5000 (large dataset) monthly roughBergomi paths.generated paths by observing the value of the test statistic of the MMD two-sample test on thesame conﬁdence level for the two diﬀerent outputs.

Remark 4.1.

Since among classical stochastic volatility models rough volatility models are partic-ularly realistic ones, we also assessed the quality of paths generated by Rough Fractional StochasticVolatility comparing them with the original S&P 500 monthly paths via the signature based MMDtest statistic results, which can be found in the code accompanying this work, published on github.5.

Conclusions

Our experiments show that the generative model works just as well but not signiﬁcantly betterif we have more original data in the training phase to calibrate the network parameters. Providingmore training data does not signiﬁcantly improve the learning process of the generative model.This demonstrates that our Variational Autoencoder based training is ideally suited for the scarcedata environment at hand, operating eﬃciently with the little data that we have available fortraining (fewer than ∼

250 samples). While returns-based optimisation works signiﬁcantly betterif it is provided with more training data (tested with numerically generated training data), thesignatures-based training delivers convincing results already for the small amount of data from theS&P paths and does not signiﬁcantly improve if more (numerical) training samples are provided.

Appendix A. Signatures

A.1.

Signatures and their properties.

In this section we recall properties of signatures thatare used in this paper. The signature of a path is a transformation of path space. Moreover,certain properties of signatures make them good feature sets for machine learning:

Theorem A.1 (Uniqueness:[6]) . Under certain assumptions (see [6] ) a path is uniquely determinedby its signature.

Thus the signature map is a faithful transformation, in the sense that distinct paths havedistinct signatures. Therefore, signatures are feature maps that do not lose any information aboutthe original path. Furthermore, for stochastic processes X : [0 , T ] → R d , the expected signaturewill play a similar role to the moments of a random variable on a ﬁnite-dimensional vector space:under certain assumptions, it characterises its law. DATA-DRIVEN MARKET SIMULATOR FOR SMALL DATA ENVIRONMENTS 23

Figure 8.

Lead-lag transformation of a price path. The ﬁgure on the left showsthe lead and lag components of the path, and the ﬁgure on the right shows thelag component plotted against the lead component.

Theorem A.2 (Expected signature characterises the law of a process:[15]) . Let X : [0 , T ] → R d be a stochastic process on a probability space (Ω , F , P ) such that its signature X < ∞ ,T is a.s. well-deﬁned. Assume that its expected signature E [ X < ∞ T ] is well-deﬁned. Under certain assumptions(see [15] ) E [ X < ∞ T ] uniquely determines the law of the stochastic process X . Hence, if one is interested in learning to generate samples from a certain stochastic process,one can instead learn to generate signatures of samples of the stochastic process. By the theoremabove, this would be suﬃcient to completely characterise the law of the process, as the expectedsignature characterises the law. On the other hand, by the uniqueness of signature, knowing thesignature of a path is essentially equivalent to knowing the path. This is precisely the approachwe will follow in this paper: we will learn how to generate signatures of paths.However, generating signatures directly is not an easy task because of the intrinsic structureof signatures. In other words, if a generative model is built to generate signatures directly, thegenerated object may not be the signature of any path. To avoid this, we will learn how to generatelog-signatures instead:

Deﬁnition A.3 (Log-signature) . Let X : [0 , T ] → R d be a path such that its signature X < ∞ ,T iswell-deﬁned. The log-signature is then deﬁned bylog X < ∞ T := − X < ∞ T + 12 ( X < ∞ T ) ⊗ −

13 ( X < ∞ T ) ⊗ + . . . + ( − n n ( X < ∞ T ) ⊗ n + . . . , which can be shown to be well-deﬁned ([49]).Taking the logarithm of the signature is an invertible operation – one can exponentiate it toretrieve the signature. Therefore, no information is lost or gained when considering log-signatures.Log-signatures are deﬁned on a certain free-Lie algebra and in fact, any element of this free-Liealgebra is the log-signature of a certain path (see [49] for more details). Hence, in this paperwe will learn how to generate log-signatures of market paths, so that we can guarantee that theoutputs of the ML generative model are indeed log-signatures.A.2. Lead-lag transformation.

In [22], the authors introduce a transformation of path spacecalled the lead-lag or the

Hoﬀ transformation. The authors showed that this transformation isable to capture the volatility of a path and, due to the importance of volatility in ﬁnance, weopted to use this transformation.Let D = { t i } ni =0 ⊂ [0 , T ], and let { X t i } t i ∈ D ⊂ R d be a d -dimensional sample. The lead-lagtransformation of { X t i } t i ∈ D is deﬁned below. Deﬁnition A.4 (Lead-lag transformation) . The lead-lag transformation of { X t i } t i ∈ D is deﬁnedby the 2 d -dimensional continuous path X D = ( X D,b , X

D,f ) : [0 , T ] → R d given by X Dt = ( X D,b , X

D,f ) := (cid:0) X t k , X t k +1 (cid:1) , t ∈ (cid:2) k nT , k +12 nT (cid:1) , (cid:0) X t k , X t k +1 + 2( t − (2 k + 1)) (cid:0) X t k +2 − X t k +1 (cid:1)(cid:1) , t ∈ (cid:104) k +12 nT , k + nT (cid:17) , (cid:0) X t k + 2( t − (2 k + )) (cid:0) X t k +1 − X t k (cid:1) , X t k +2 (cid:1) , t ∈ (cid:104) k + nT , k +22 nT (cid:17) . The component X D,b is the backward or lag component, and X D,f is the forward or leadcomponent. The signature of the lead-lag transformation will be denoted by X D,< ∞ ,T .Figure 8 shows the lead-lag transformation of a certain path. As the name suggests, the leadcomponent is leading the lag component. As it was shown in [22], the relationship between thelead and lag component is able to capture the volatility of the path. For instance, if the sample { X t i } t i ∈ D ⊂ R d comes from a d -dimensional continuous semimartingale, when the size of themesh tends to 0 the authors in [22] showed that X D,< ∞ ,T converges to a certain rough path thatincorporates information about the quadratic variation – i.e. volatility – of the semimartingale. Appendix B. Variational Autoencoders

In this section we brieﬂy recall the basics of Variational Autoencoders (henceforth VAEs), ourchoice of generative model. More details and background information on VAEs can be foundin [26, 40, 42]. VAEs have been recently itroduced in the pioneering 2014 article of Kingma andWelling [40]. A particularly applealing feature of that work is, as they emphasize, that their resultscan by construction be applied to nonstationary settings, such as time-series data [40, Section 2].In the section below, we lay out also further reasons of our motivation for choosing VAEs asa generative model for our Market Generator. The basic mechanism of variational autoencoderessentially rely on a Maximum Likelihood idea (see [26]), adjusting the generative process viabackpropagation to maximize (lower bound of) the probability of observing the given trainingsamples. P ( X ) = (cid:90) P ( X | z ; θ ) P ( z ) dz (B.1)It is common practice to choose the initial distribution as a d-dimensional Gaussian. P ( X | z ; θ ) = N ( f ( z ; θ, σ ∗ I )) , where σ is to be set as a hyperparamter P ( z ) = N ( z | , I )(B.2)The basic functioning of a VAE is the following: Given one random variable z with one distri-bution, we can create another random variable X = g ( z ) with very diﬀerent distribution: Thedeterministic function g is then learned from the data through the function approximation capac-ity of the neural network.If using these expressions (B.2), we can ﬁnd a (continuously diﬀerentiable) expression for P ( X )in (B.1), then we can optimize the model using stochastic gradient descent to update the networkparameters θ . By gradient descent, we gradually make the training data more probable under thegenerative model. The equation (B.1) implies two tasks for VAEs to solve (i) how to deﬁne thelatent variable space Z (what information they represent and what is the structure between its di-mensions) making sure that the latent variables capture the relevant information in the generativeprocess and (ii) taking the integral over z in (B.1). Regarding problem (i) VAEs avoid explicitlydescribing the dependencies between dimensions of z , assume no latent structure, instead, theyrely on the inherent property of neural networks as functional approximators. Finally, it is worthmentioning that Variational Autoencoders are called “autoencoders” because the training objec-tive (B.1) that derives from this setup has an encoder and decoder structure that resembles atraditional autoencoder, the non-linear generalisation of PCA. For more details and backgroundon Variational Autoencoders see the original work [40] and for a gentle introduction see [42]. DATA-DRIVEN MARKET SIMULATOR FOR SMALL DATA ENVIRONMENTS 25

References [1] A. Antonov, J. F. Baldeaux, and R. Sesodia. Quantifying model performance. Preprint, SSRN:3299615, 2018.[2] J. Armstrong, C. Bellani, D. Brigo, and T. Cass. Option pricing models without probability. Preprint,arXiv:1808.09378, 2018.[3] D. Bartl, S. Drapeau, J. Ob(cid:32)l´oj and J. Wiesel. Data driven robustness and uncertainty sensitivity analysis. Inpreparation, 2020.[4] C. Bayer, P. Friz, J. Gatheral. Pricing under rough volatility.

Quantitative Finance , (6), p. 887-904, 2016[5] C. Bayer, B. Horvath, A. Muguruza, B. Stemper,and M. Tomas. On deep pricing and calibration of (rough)stochastic volatility models. Preprint, arXiv:1908.08806, 2019.[6] Boedihardjo, H., Geng, X., Lyons, T. and Yang, D., 2016. The signature of a rough path: uniqueness. Advancesin Mathematics, 293 , pp.720-737.[7] H. Boedihardjo and X. Geng. The uniqueness of signature problem in the non-Markov setting.

StochasticProcesses and their Applications , Vol 25 (12), pp 4674-4701, 2015.[8] P. Bonnier, P. Kidger, I. Perez Arribas, C. Salvi and T. Lyons. Deep Signature Transforms. ArXiv:1905.08494,2019.[9] D. Brigo. Probability-free models in option pricing: statistically indistinguishable dynamics and historical vs.implied volatility.

Options: 45 Years after the publication of the Black-Scholes-Merton Model , Conferencepaper , 2019.[10] H. B¨uhler, L. Gonon, J. Teichmann, and B.Wood. Deep hedging.

Quantitative Finance , (8) p. 1271-1291,2019.[11] O. Bousquet, S. Gelly, I. Tolstikhin, C.-J. Simon-Gabriel, and B. Sch¨olkopf. From optimal transport to gener-ative modeling: the VEGAN cookbook. CoRR , abs/1705.07642, May 2017.[12] C. Cuchiero, W. Khosrawi, J. Teichmann. A generative adversarial network approach to calibration of localstochastic volatility models. Preprint, arxiv.org:2005.02505, 2020.[13] I. Chevyrev, and A. Kormilitzin. A primer on the signature method in machine learning. Preprint,arXiv:1603.03788, 2016.[14] I. Chevyrev and H. Oberhauser. Signature moments to characterize laws of stochastic processes. Preprint,arxiv.org:1810.10971, 2018.[15] I. Chevyrev and T. Lyons. Characteristic functions of measures on geometric rough paths. The Annals ofProbability

Vol 44 (6), pp. 4049-4082, 2016.[16] R. Cont. Empirical properties of asset returns: stylized facts and statistical issues.

Quantitative Finance ,(1): 223-236, 2001.[17] R. Cont and S. B. Hamida. Recovering Volatility from Option Prices by Evolutionary Optimization. FinanceConcepts Working Papers, (1) 2004.[18] J. Chang. Eﬀective algorithms for inverting the signature of a path. Doctoral dissertation, University of Oxford ,2018.[19] C. Doersch, A. Gupta, M. Hebert and J. Walker. An uncertain future: Forecasting from static images usingvariational autoencoders. 2012.[20] M. Duembgen and L. C. G. Rogers. Estimate nothing.

Quantitative Finance , (12), p. 2065-2072, 2014.[21] C. Esteban, S. L. Hyland, and G. Ra asch, Real-valued (Medical) Time Series Generation with RecurrentConditional GANs, ArXiv:1706.02633, 2018.[22] G. Flint, B. Hambly, and T. Lyons, 2016. Discretely sampled signals and the rough Hoﬀ process. StochasticProcesses and their Applications, 126 (9), pp.2593-2614.[23] P. Friz, M. Hairer. A Course on Rough Paths: With an Introduction to Regularity Structures. Cham:

SpringerInternational Publishing , 2014.[24] J. Gatheral, T. Jaisson, M. Rosenbaum. Volatility is rough,

Quantitative Finance , :6, 933-949, 2018.[25] A. Gretton, K. M. Bogwardt, M. Rasch, B. Sch¨onkof, A.J. Smola A Kernel Two-Sample Test. Journal ofMachine Learning Research , MIT Press , 2016.[27] F. Gressmann, F. J. Kir´aly, B. Mateen, and H. Oberhauser. Probabilistic Supervised Learning. May 2018.[28] A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Sch¨olkopf, and A. Smola. A kernel two-sample test.

J. MachineLearning Research , :pp 723-773, March 2012.[29] L. G. Gyurk´o, T. Lyons, M. Kontkowski, J. Field. Extracting information from the signature of a ﬁnancialdata stream, Preprint , ArXiv:1307.7244 2014.[30] B. Hambly and T. Lyons Uniqueness for the signature of a path of bounded variation and the reduced pathgroup.

Annals of Mathematics , Vol

Journalof Applied Econometrics , (7), 2005.[32] P. Henry-Labordere. Generative models for ﬁnancial data. Preprint, SSRN 3408007, 2019.[33] P. Henry-Labordere. (Martingale) optimal transport and anomaly detection with neural networks: A primal-dual algorithm. arXiv:1904.04546, 2019.[34] G. Hinton. Introduction to Neural Networks and Machine Learning Lecture notes (CSC321), Lecture 15:Mixture of experts.[35] B. Horvath, A. Muguruza, M. Tomas. Deep learning volatility. Preprint, arXiv:1901.09647, 2019. [36] K. Hornik, M. Stinchcombe, and H. White. Multilayer feedforward networks are universal approximators. Neural Networks , (5):359-366, 1989.[37] K. Hornik. Approximation capabilities of multilayer feedforward networks. Neural Networks , (2):251-257,1991.[38] A. Jentzen, B. Kuckuck, A. Neufeld, and P. von Wurstemberger. Strong error analysis for stochastic gradientdescent optimization algorithms. Preprint, arXiv:1801.09324, 2018.[39] A. Koshiyama, N. Firoozye, and P. Treleaven. Generative adversarial networks for ﬁnancial trading strategiesﬁne-tuning and combination. Preprint arXiv:1901.01751, 2019.[40] D. P. Kingma and M. Welling. Auto-Encoding Variational Bayes. Foundations and Trends in Machine Learn-ing , arXiv:1312.6114.pdf, 2014.[41] Patrick Kidger and Terry Lyons. Signatory: diﬀerentiable computations of the signature and logsignaturetransforms, on both CPU and GPU. Preprint, arXiv:2001.00706, 2020.[42] D. P. Kingma and M. Welling. An Introduction to Variational Autoencoders.

Foundations and Trends inMachine Learning , 2019.[43] A. S. Koshiyama, N. Firoozye, and P. C. Treleaven, Generative adversarial networks for ﬁnancial tradingstrategies ﬁne-tuning and combination, Preprint arXiv:1901.01751, 2019.[44] A. Kondratyev, C. Schwarz. The market generator. Preprint, SSRN:3384948, 2019.[45] A. Kondratyev, C. Schwarz, B. Horvath, Data anonymisation, outlier detection and ﬁghting overﬁtting withRestricted Boltzmann Machines. Preprint, SSRN:3526436, 2020.[46] T. Lyons. Diﬀerential equations driven by rough signals.

Revista Matemtica Iberoamericana , (2), pp.215-310,1998.[47] T. Lyons. Rough paths, signatures and the modelling of functions on streams, Preprint , arXiv:1307.7244, 2014.[48] D. Levin, T. Lyons, H. Ni. Learning from the past, predicting the statistics for the future, learning an evolvingsystem, arXiv preprint arXiv:1309.0260 2016.[49] T. Lyons, M. Caruana, and T. L´evy.

Diﬀerential equations driven by rough paths . Springer Berlin Heidelberg,2007.[50] R. Leonarduzzi, G. Rochette, J.-P. Bouchaud and S. Mallat. Maximum Entropy Scattering Models for FinancialTime Series. Conference paper: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech andSignal Processing (ICASSP) DOI: 10.1109/ICASSP.2019.8683734, 2019.[51] R. Luo, W. Zhang, X. Xu, and J. Wang. A Neural Stochastic Volatility Model, Preprint, arXiv:1712.00504,2018.[52] H. Ni, S. Liao, L. Szpruch, M.Wiese, B. Xiao. Conditional Sig-Wasserstein GANs for Time Series Generation.Preprint, 2020.[53] H. Ni, S. Liao, T. Lyons, W. Yang. Learning stochastic diﬀerential equations using RNN with log signaturefeatures. Preprint, arXiv:1908.08286, 2019.[54] H. Oberhauser and F. Kir´aly. Kernels for sequentially ordered data.

Journal of Machine Learning Research , (31):1-45, 2019.[55] I. Perez Arribas. Derivatives pricing using signature payoﬀs, Preprint , arXiv:1809.094662018.[56] J. Reizenstein, B. Graham. The iisignature library: eﬃcient calculation of iterated-integral signatures and logsignatures. arxiv:1802.08252, 2018.[57] J. Ruf and W. Wang. Neural networks for option pricing and hedging: a literature review. Preprint,arXiv:1911.05620, 2019.[58] R. M. Rustamov. Closed-form expressions for maximum mean discrepancy with applications to Wassersteinauto-encoders. Preprint, arxiv:1901.03227, 2019.[59] D. J. Sutheraland, H. Y. Tung, H. Strathmann, S. De, A. Ramdas, A. Smola and A. Gretton. Generative modelsand model criticism via optimized maximum mean discrepancy. , ICLR 2017, Toulon, France, April 24-26, 2017 , Conference Track Proceedings, 2017.[60] S. Takahashi, Y. Chen, and K. Tanaka-Ishii, Modeling ﬁnancial time-series with generative adversarial net-works,

Physica A: Statistical Mechanics and its Applications 527 , p. 121261 , Aug. 2019.[61] I. Tolstikhin, O. Bousquet, S. Gelly, and B. Scholkopf. Wasserstein autoencoders. In International Conferenceon Learning Representations , 2018.[62] T. Trimborn, P. Otte, S. Cramer, M. Beikirch, E. Pabich and M. Frank. SABCEMM-A Simulator for Agent-Based Computational Economic Market Models.

Preprint , arXiv:1801.01811, 2018.[63] M. S. Vidales, D. Siska, and L. Szpruch. Unbiased deep solvers for parametric pdes. Preprint arXiv:1810.05094,2018.[64] M. Wiese, L. Bai, B. Wood, H. B¨uhler. Deep Hedging: Learning to simulate equity options markets.

Preprint ,arXiv:1911.01700, 2019.[65] M. Wiese, R. Knobloch, R. Korn, P. Kretschmer. Quant GANs: Deep Generation of Financial Time Series.

Preprint , arXiv:1907.06673, 2019.[66] T. Xu, L. K. Wenliang, M. Munn, B. Acciaio. COT-GAN: Generating Sequential Data via Causal OptimalTransport. Preprint, arXiv:2006.08571, 2020.[67] K. Zhang, G. Zhong, J. Dong, S. Wang, Y. Wang, Stock Market Prediction Based on Generative AdversarialNetwork, Procedia Computer Science , pp. 400406, 2019.

DATA-DRIVEN MARKET SIMULATOR FOR SMALL DATA ENVIRONMENTS 27

J.P. Morgan, London

E-mail address : [email protected] Department of Mathematics, King’s College London and The Alan Turing Institute

E-mail address : [email protected] and [email protected] Mathematical Institute, University of Oxford and The Alan Turing Institute

E-mail address : [email protected] Mathematical Institute, University of Oxford and The Alan Turing Institute

E-mail address : [email protected] J.P. Morgan, London

E-mail address ::