Matthew Francis Dixon
Illinois Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Matthew Francis Dixon.
high performance computational finance | 2009
Matthew Francis Dixon; Jike Chong; Kurt Keutzer
The proliferation of algorithmic trading, derivative usage and highly leveraged hedge funds necessitates the acceleration of market Value-at-Risk (VaR) estimation to measure the severity of portfolios losses. This paper demonstrates how solely relying on advances in computer hardware to accelerate market VaR estimation overlooks significant opportunities for acceleration. We use a simulation based delta-gamma Value-at-Risk (VaR) estimate and compute the loss function using basic linear algebra subroutines (BLAS). Our NVIDIA GeForce GTX280 graphics processing unit (GPU) based baseline implementation is a straight-forward port from the CPU implementation and only had a 8.21x speed advantage over a quadcore Intel Core2 Q9300 central processing unit (CPU) based implementation. We demonstrate three approaches to gain additional speedup over the baseline GPU implemention. Firstly, we reformulate the loss function to reduce the amount of necessary computation and achieved a 60.3x speedup. Secondly, we selected functionally equivalent distribution conversion modules to give the best convergence rate - providing an additional 2x speedup. Thirdly, we merged data-parallel computational kernels to remove redundant load store operations leading to an additional 1.85x speedup. Overall, we have achieved a speedup of 148x against the baseline GPU implementation, reducing the time of a VaR estimation with a standard error of 0.1% from minutes to less than one second.
high performance computational finance | 2015
Matthew Francis Dixon; Diego Klabjan; Jin Hoon Bang
Deep neural networks (DNNs) are powerful types of artificial neural networks (ANNs) that use several hidden layers. They have recently gained considerable attention in the speech transcription and image recognition community (Krizhevsky et al., 2012) for their superior predictive properties including robustness to overfitting. However their application to financial market prediction has not been previously researched, partly because of their computational complexity. This paper describes the application of DNNs to predicting financial market movement directions. A critical step in the viability of the approach in practice is the ability to effectively deploy the algorithm on general purpose high performance computing infrastructure. Using an Intel Xeon Phi co-processor with 61 cores, we describe the process for efficient implementation of the batched stochastic gradient descent algorithm and demonstrate a 11.4x speedup on the Intel Xeon Phi over a serial implementation on the Intel Xeon.
high performance computational finance | 2013
Matthew Francis Dixon; Mohammed Zubair
Low-latency real-time option analytics feeds provide tick-by-tick implied volatilities and greeks based on exchange data. In order for the Black-Scholes implied volatility surface to exhibit the empirically observed skew or smile, a stochastic volatility model can be used to compute the option greeks. Because the European price under many stochastic volatility models only exists in semi-analytic form, frequent robust calibration of the model is computationally prohibitive. This paper explores three parallelization approaches for calibrating stochastic volatility models deployed on a multicore CPU cluster. The contribution of this paper is to provide benchmarks demonstrating hybrid shared and distributed memory parallelization techniques using Python packages for robust calibration of stochastic volatility models. The focus here will be on the Heston and Bates models, but the results in this paper generalize to any of the exponential Levy models incorporating stochastic volatility and jumps and whose characteristic function can be expressed in closed form. We evaluated the performance for our implementation on a cluster of 32 dual socket Dell PowerEdge R410 nodes providing 256 cores in total. Using distributed memory parallelization, we obtain a speedup of up to 139x against the sequential version of the calibration error function evaluation and reduce the overall time taken to calibrate a chain of 1024 SPX options by a factor of 37x.
Journal of Computational Science | 2017
Matthew Francis Dixon
Recurrent neural networks (RNNs) are types of artificial neural networks (ANNs) that are well suited to forecasting and sequence classification. They have been applied extensively to forecasting univariate financial time series, however their application to high frequency trading has not been previously considered. This paper solves a sequence classification problem in which a short sequence of observations of limit order book depths and market orders is used to predict a next event price-flip. The capability to adjust quotes according to this prediction reduces the likelihood of adverse price selection. Our results demonstrate the ability of the RNN to capture the non-linear relationship between the near-term price-flips and a spatio-temporal representation of the limit order book. The RNN compares favorably with other classifiers, including a linear Kalman filter, using S&P500 E-mini futures level II data over the month of August 2016. Further results assess the effect of retraining the RNN daily and the sensitivity of the performance to trade latency.
Social Science Research Network | 2016
Matthew Francis Dixon; Diego Klabjan; Jin Hoon Bang
Deep neural networks (DNNs) are powerful types of artificial neural networks (ANNs) that use several hidden layers. They have recently gained considerable attention in the speech transcription and image recognition community for their superior predictive properties including robustness to over fitting. However their application to algorithmic trading has not been previously researched, partly because of their computational complexity. This paper describes the application of DNNs to predicting financial market movement directions. In particular we describe the configuration and training approach and then demonstrate their application to back testing a simple trading strategy over 43 different Commodity and FX future mid-prices at 5-minute intervals. All results in this paper are generated using a C implementation on the Intel Xeon Phi co-processor which is 11.4x faster than the serial version and a Python strategy back testing environment both of which are available as open source code written by the authors.
high performance computing symposium | 2014
Matthew Francis Dixon; Sabbir Khan; Mohammed Zubair
Broadly, a major prerequisite for analytics applications is robustness to modeling idiosyncrasies. As a result, there is a demand for comprehensive model exploration and validation in high level statistical programming environments such as R. Many financial applications require on-demand processing, which in turn requires fast modeling and calibration computations. In this paper we describe our work on speeding up the calibration of a Heston stochastic volatility model, a financial application, on GPUs. The Heston volatility model is used extensively across the capital markets to price and measure the market risk of exchange traded financial options. However, a typical R based implementation of the Heston model calibration on a CPU does not meet the performance requirements for sub-minute level trading, i.e. mid to high frequency trading. The calibration of a Heston model is performed over M option data points which remains fixed during the calibration computation. The most computation intensive part of this computation is the ErrorFunction() which estimates the error between market observed and model option prices. We have implemented the ErrorFunction() using a MapReduce design pattern leading to efficient implementation on various architectures including GPUs. In this paper, we describe the implementation of a GPU optimized kernel for this computation that can be called by the R script performing the calibration process. For M = 1024 we demonstrate a factor of 760x improvement in the overall calibration time over the R sequential implementation by off-loading ErrorFunction() on a system with an Intel Core i5 processor and NVIDIA Tesla K20c (Kepler architecture) consisting of 2496 cores. Note that not all the performance gain is due to the GPU- partly it is due to the reduction in the overhead of R for the Heston model calculation. For comparison we also implemented the calibration code using C/C++. We observed a speed up of 230x for the GPU based implementation over the C/C++ indicating that a factor of 3.4x improvement is due to avoiding the R overhead for the Heston model calculation. However, the overall calibration time using R based optimization routines combined with the GPU off-loaded ErrorFunction() is comparable to a C/C++ GPU based calibration code.
GPU Computing Gems Jade Edition | 2012
Matthew Francis Dixon; Thomas Bradley; Jike Chong; Kurt Keutzer
Publisher Summary A typical implementation of the Monte Carlo method involves a simple solution structure where experiments are generated, executed, and the experimental output is assimilated to estimate the result. Monte Carlo Value-at-Risk (MC-VaR) is ideal for graphics processing units (GPUs) as it requires a small set of parameters to set up the estimation process, involves a large amount of computation, and outputs a concise set of risk profiles as the result. The solution is based on a set of considerations or perspectives for optimizing the performance critical steps in our GPU-based MC-VaR implementation— uniform random sequence generation, parameter distribution conversion, and portfolio loss calculation (including risk factor cross-correlation). The Monte Carlo method is an approach where the solution to some problem is estimated by statistically sampling the problems parameter space with thousands to millions of experiments using different parameter settings. The Monte Carlo method has several properties that make it desirable for implementation on a high performance GPU accelerator—experiments are independent and parallelizable; execution is computationally expensive; and input specifications and results are concise. VaR estimation uses the Monte Carlo method to simulate the effects of various market movement scenarios on the value of the portfolio. A comparatively small set of parameters is required to set up the estimation process, whereas experiment execution requires a large amount of computation. A critical feature of the Monte Carlo method is the type of random number generator used to generate the random experiments in the first step of the solution structure.
Concurrency and Computation: Practice and Experience | 2012
Matthew Francis Dixon; Jike Chong; Kurt Keutzer
Values of portfolios in modern financial markets may change precipitously with changing market conditions. The utility of financial risk management tools is dependent on whether they can estimate Value‐at‐Risk (VaR) of portfolios on‐demand when key decisions need to be made. However, VaR estimation of portfolios uses the Monte Carlo method, which is a computationally intensive method often run as an overnight batch job. With the proliferation of highly parallel computing platforms such as multicore CPUs and manycore graphics processing units (GPUs), teraFLOPS of computation capability is now available on a desktop computer, enabling the VaR of large portfolios with thousands of risk factors to be computed within only a fraction of a second.
The Journal of Supercomputing | 2004
Matthew Francis Dixon; C. J. Kenneth Tan
A new approach to solving D> 3 spatial dimensional convection-diffusion equation on clusters of workstations is derived by exploiting the stability and scalability of the combination of a generalized D dimensional high-order compact (HOC) implicit finite difference scheme and parallelized GMRES(m). We then consider its application to multifactor Option pricing using the Black–Scholes equation and further show that an isotropic fourth order compact difference scheme is numerically stable and determine conditions under which its coefficient matrix is positive definite. The performance of GMRES(m) on distributed computers is limited by the inter-processor communication required by the matrix-vector multiplication. It is shown that the compact scheme requires approximately half the number of communications as a non-compact difference scheme of the same order of truncation error. As the dimensionality is increased, the ratio of computation that can be overlapped with communication also increases. CPU times and parallel efficiency graphs for single time step approximation of up to a 7D HOC scheme on 16 processors confirm the numerical stability constraint and demonstrate improved parallel scalability over non-compact difference schemes.
Concurrency and Computation: Practice and Experience | 2016
Matthew Francis Dixon; Jörg Lotze; Mohammad Zubair
Financial markets change precipitously, and on‐demand pricing and risk models must be constantly recalibrated to reduce risk. However, certain classes of models are computationally intensive to robustly calibrate to intraday prices – stochastic volatility models being an archetypal example due to the non‐convexity of the objective function. In order to accelerate this procedure through parallel implementation, financial application developers are faced with an ever growing plethora of low‐level high‐performance computing frameworks such as Open Multi‐Processing, Open Computing Language, compute unified device architecture, or single instruction multiple data intrinsics, and forced to make a trade‐off between performance versus the portability, flexibility, and modularity of the code required to facilitate rapid in‐house model development and productionisation. This paper describes the acceleration of stochastic volatility model calibration on multi‐core CPUs and graphics processing units (GPUs) using the Xcelerit platform. By adopting a simple programming model, the Xcelerit platform enables the application developer to write sequential, high‐level C++ code, without concern for low‐level high‐performance computing frameworks. This platform provides the portability, flexibility, and modularity required by application developers. Speedups of up to 30x and 293x are respectively achieved on an Intel Xeon CPU and NVIDIA Tesla K40 GPU, compared with a sequential CPU implementation. The Xcelerit platform implementation is further shown to be equivalent in performance to a low‐level compute unified device architecture version. Overall, we are able to reduce the entire calibration process time of the sequential implementation from 6189 to 183.8 and 17.8 s on the CPU and GPU, respectively, without requiring the developer to reimplement in low‐level high‐performance computing frameworks. Copyright