Xiang Tian | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Xiang Tian is active.

Explore More

Publication

Featured researches published by Xiang Tian.

ACM Transactions on Reconfigurable Technology and Systems | 2010

High-Performance Quasi-Monte Carlo Financial Simulation: FPGA vs. GPP vs. GPU

Xiang Tian; Khaled Benkrid

Quasi-Monte Carlo simulation is a special Monte Carlo simulation method that uses quasi-random or low-discrepancy numbers as random sample sets. In many applications, this method has proved advantageous compared to the traditional Monte Carlo simulation method, which uses pseudo-random numbers, thanks to its faster convergence and higher level of accuracy. This article presents the design and implementation of a massively parallelized Quasi-Monte Carlo simulation engine on an FPGA-based supercomputer, called Maxwell. It also compares this implementation with equivalent graphics processing units (GPUs) and general purpose processors (GPP)-based implementations. The detailed comparison between these three implementations (FPGA vs. GPP vs. GPU) is done in the context of financial derivatives pricing based on our Quasi-Monte Carlo simulation engine. Real hardware implementations on the Maxwell machine show that FPGAs outperform equivalent GPP-based software implementations by 2 orders of magnitude, with the speed-up figure scaling linearly with the number of processing nodes used (FPGAs/GPPs). The same implementations show that FPGAs achieve a ~ 3x speedup compared to equivalent GPU-based implementations. Power consumption measurements also show FPGAs to be 336x more energy efficient than CPUs, and 16x more energy efficient than GPUs.

adaptive hardware and systems | 2009

Mersenne Twister Random Number Generation on FPGA, CPU and GPU

Xiang Tian; Khaled Benkrid

Random number generation is a very important operation in computational science e.g. in Monte Carlo simulations methods. It is also a computationally intensive operation especially for high quality random number generation. In this paper, we present the design and implementation of a parallel implementation of one of the most widely used random number generators, namely the Mersenne Twister. The latter is very widely used in high performance computing applications such as financial computing. Implementations of our parallel Mersenne Twister number generator core on Xilinx Virtex4 FPGAs achieve a throughput of 26.13 billion random samples per second. The paper also reports equivalent parallel software implementations running on an Intel Core 2 Quad Q9300 CPU with 8 GB RAM, using multi-threading technology and the Intel® Math Kernel Library (MKL), as well as on an NVIDIA 8800 GTX GPU. Comparative results show that our FPGA-based implementation outperforms equivalent CPU and GPU implementations by ~25x and ~9x respectively. Moreover, when using the same amount of energy, the FPGA can generate 37x and 35x more Mersenne Twister random samples than the CPU and the GPU, respectively.

field-programmable technology | 2008

Design and implementation of a high performance financial Monte-Carlo simulation engine on an FPGA supercomputer

Xiang Tian; Khaled Benkrid

Monte-Carlo simulation is a very widely used technique in scientific computations in general with huge computation benefits in solving problems where closed form solutions are impossible to derive. This technique is also characterized by a high degree of parallelism as a large number of different simulation paths need to be calculated, which makes it ideal for a parallel hardware implementation. This paper illustrates the benefits of such implementation in the context of financial computing as it implements a financial Monte-Carlo simulation engine on an FPGA-based supercomputer, called Maxwell, developed at the University of Edinburgh. The latter consists of a 32 CPU cluster augmented with 64 Virtex-4 Xilinx FPGAs connected in a 2D torus. Our engine can implement various Monte-Carlo simulations on the Maxwell machine with speed-ups in the 3-order magnitude compared to equivalent software implementations. This is illustrated in this paper in the context of an implementation of the Black-Scholes option pricing model. Real hardware implementation shows that our FPGA-based implementation of the Black-Scholes model outperforms an equivalent software implementation running on a workstation cluster with the same number of computing nodes (CPU/FPGA) by a factor of 750, which is the fastest ever reported FPGA implementation of this model.

International Journal of Reconfigurable Computing | 2012

High performance biological pairwise sequence alignment: FPGA versus GPU versus cell BE versus GPP

Khaled Benkrid; Ali Akoglu; Cheng Ling; Yang Song; Ying Liu; Xiang Tian

This paper explores the pros and cons of reconfigurable computing in the form of FPGAs for high performance efficient computing. In particular, the paper presents the results of a comparative study between three different acceleration technologies, namely, Field Programmable Gate Arrays (FPGAs), Graphics Processor Units (GPUs), and IBMs Cell Broadband Engine (Cell BE), in the design and implementation of the widely-used Smith-Waterman pairwise sequence alignment algorithm, with general purpose processors as a base reference implementation. Comparison criteria include speed, energy consumption, and purchase and development costs. The study shows that FPGAs largely outperform all other implementation platforms on performance per watt criterion and perform better than all other platforms on performance per dollar criterion, although by a much smaller margin. Cell BE and GPU come second and third, respectively, on both performance per watt and performance per dollar criteria. In general, in order to outperform other technologies on performance per dollar criterion (using currently available hardware and development tools), FPGAs need to achieve at least two orders of magnitude speed-up compared to general-purpose processors and one order of magnitude speed-up compared to domain-specific technologies such as GPUs.

field-programmable technology | 2009

American option pricing on reconfigurable hardware using Least-Squares Monte Carlo method

Xiang Tian; Khaled Benkrid

The valuation of optimal exercise of American-style options is one of the most important problems in option pricing theory. Unlike European options, American options have the feature of early exercise, which makes it hard to simulate using the simple Monte Carlo method. A number of extended Monte Carlo methods have been published recently; the Least-Squares Monte Carlo (LSMC) suggested by Longstaff and Schwartz is one of the most adopted algorithms in the industry. Although hardware acceleration technique has been used in financial computing for several years, there has not been any published hardware implementation of the LSMC method. In this paper, we present an FPGA hardware architecture for the acceleration of the LSMC method. In it, the Quasi-Monte Carlo method is adopted for stock price paths generation. Our real FPGA hardware implementation on a Xilinx Virtex-4 XC4VSX55 chip achieves 25x and 18x speed-ups in the Monte Carlo simulation and regression steps of the American option pricing, respectively, compared to an equivalent pure software implementation captured in C++ and run on an Intel Xeon 2.8 GHz CPU. This results in an overall speed-up figure of 20x compared to a CPU-based implementation. Given that the FPGA implementation is clocked at only 75MHz, the FPGA implementation also exhibits considerable energy savings.

signal processing systems | 2012

Implementation of the Longstaff and Schwartz American Option Pricing Model on FPGA

Xiang Tian; Khaled Benkrid

American style options are widely used financial products, whose pricing is a challenging problem due to their path dependency characteristic. Finite difference methods and tree-based methods can be used for American option pricing. However, the major drawback of these methods is that they can often only handle one or two sources of uncertainty; for more state variables they become computationally prohibitive, with computation times typically increasing exponentially with the number of state variables. Alternative solutions are the extended Monte Carlo methods, such as the Least-Squares Monte Carlo (LSMC) method suggested by Longstaff and Schwartz, which uses of regression to estimate continuation values from simulated paths. In this paper, we present an FPGA hardware architecture for the acceleration of the LSMC method, with Quasi-Monte Carlo path generation. Our FPGA hardware implementation on a Xilinx Virtex-4 XC4VFX100 chip achieves 25× and 18× speed-ups in the path generation and regression steps, respectively, compared to an equivalent pure software implementation captured in C++ and run on an Intel Xeon 2.8 GHz CPU. This provides overall speed-up of 20× compared to a CPU-based implementation. Power measurements also show that our FPGA implementation is 54× more energy efficient than the pure software implementation.

international workshop on high-performance reconfigurable computing technology and applications | 2008

Massively parallelized Quasi-Monte Carlo financial simulation on a FPGA supercomputer

Xiang Tian; Khaled Benkrid

Quasi-Monte Carlo simulation is a specialized Monte Carlo method which uses quasi-random, or low-discrepancy, numbers as the stochastic parameters. In many applications, this method has proved advantageous compared to the traditional Monte Carlo simulation method, which uses pseudo-random numbers, as it converges relatively quickly, and with a better level of accuracy. We implemented a massively parallelized Quasi-Monte Carlo simulation engine on a FPGA-based supercomputer, called Maxwell, and developed at the University of Edinburgh. Maxwell consists of 32 IBM Intel Xeon blades each hosting two Virtex-4 FPGA nodes through PCI-X interface. Real hardware implementation of our FPGA-based quasi-Monte Carlo engine on the Maxwell machine outperforms equivalent software implementations running on the Xeon processors by 3 orders of magnitude, with the speed-up figure scaling linearly with the number of processing nodes. The paper presents the detailed design and implementation of our Quasi-Monte Carlo engine in the context of financial derivatives pricing.

reconfigurable computing and fpgas | 2010

Fixed-Point Arithmetic Error Estimation in Monte-Carlo Simulations

Xiang Tian; Khaled Benkrid

As Field Programmable Gate Arrays (FPGAs) get faster and denser, the scope of their applications is getting wider. High performance computing applications, for instance, are an example of such application expansion driven by FPGAs’ increasing computational power coupled with their relatively low power consumption compared to state-of-the-art microprocessor technology. However, one major hurdle facing FPGAs in the high performance computing arena, in addition to their low level programming model, is their low efficiency in implementing double precision floating-point arithmetic, which is often considered essential in many high performance applications. This paper attempts to dispel the latter perceived limitation in the area of Monte-Carlo based stochastic process simulation through a rigorous estimation of fixed-point arithmetic error in a hardware implementation of the Monte-Carlo based European option pricing model. Representations of the mean and variance of quantisation and rounding-off errors due to fixed-point arithmetic show this error to be negligible when compared to the variance of the Monte-Carlo simulation method itself. Not only does this allow us to avoid full double precision arithmetic implementation, but also to minimise the fixed-point word length used without practically affecting the precision of the final result. This in turn results in considerable area savings and throughput increases.

Archive | 2013

Monte-Carlo Simulation-Based Financial Computing on the Maxwell FPGA Parallel Machine

Xiang Tian; Khaled Benkrid

Efficient computational solutions for scientific and engineering problems are a priority for many governments around the world, as they can offer major economic comparative advantages. Financial computing problems are a prime example of such problems where even the slightest improvements in execution times and latency can generate large amounts of extra profits. However, financial computing has not benefited relatively greatly from early developments in high performance computing, as the latter aimed mainly at engineering and weapon design applications. Besides, financial experts were initially focusing on developing mathematical models and computer simulations in order to comprehend the behavior of financial markets and develop risk-management tools. As this effort progressed, the complexity of financial computing applications grew up rapidly. Hence, high performance computing turned out to be very important in the field of finance.Many financial models do not have a practical closed-form solution in which case numerical methods are the only alternative. Monte-Carlo simulation is one of the most commonly used numerical methods, in financial modeling and scientific computing in general, with huge computation benefits in solving problems where closed-form solutions are impossible to derive. As the Monte-Carlo method relies on the average result of thousands of independent stochastic paths, massive parallelism can be harnessed to accelerate the computation. For this, high performance computers, increasingly with off-the-shelf accelerator hardware, are being proposed as an economic high performance implementation platform for Monte-Carlo-based simulations. Field programmable gate arrays (FPGAs) in particular have been recently proposed as a high performance and relatively low power acceleration platform for such applications.In light of the above, the project presented in this chapter develops novel FPGA hardware architectures for Monte-Carlo simulations of different types of financial option pricing models, namely European, Asian, and American options, the stochastic volatility model (GARCH model), and Quasi-Monte Carlo simulation. These architectures have been implemented on an FPGA-based supercomputer, called Maxwell, developed at the University of Edinburgh, which is one of the few openly available FPGA parallel machines in the world. Maxwell is a 32-CPU cluster augmented with 64 Virtex-4 Xilinx FPGAs connected in a 2D torus. Our hardware implementations all show significant computing efficiency compared to traditional software-based implementations, which in turn shows that reconfigurable computing technology can be an efficacious and efficient platform for high performance computing applications, particularly financial computing.

2010 VI Southern Programmable Logic Conference (SPL) | 2010

Libor market model simulation on an FPGA parallel machine

Xiang Tian; Khaled Benkrid

In this paper, we present a high performance scalable FPGA design and implementation of an interest rate derivative pricing engine that targets on the cap pricing. The design consists of a Gaussian random number generator, based on the Mersenne Twister uniform random generator, and a Monte Carlo path generation engine which calculates the prices of an interest rate derivative based on the LIBOR market model. We implemented this design on the Maxwell FPGA supercomputer using up to 32 Xilinx XC4VFX100 FPGA nodes. We have also compared our FPGA hardware implementation with an equivalent optimized pure software implementation running on up to 32 2.8GHz Xeon processors with 1 GB RAM each. This showed our FPGA implementation to be 58x faster than the optimized software implementation, while being more than two orders of magnitude more energy efficient. These results scale linearly with the number of FPGA and Xeon processor nodes used.

Explore More