Qiwei Jin | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Qiwei Jin is active.

Explore More

Publication

Featured researches published by Qiwei Jin.

field programmable gate arrays | 2012

A mixed precision Monte Carlo methodology for reconfigurable accelerator systems

Gary Chun Tak Chow; Anson H. T. Tse; Qiwei Jin; Wayne Luk; Philip Heng Wai Leong; David B. Thomas

This paper introduces a novel mixed precision methodology applicable to any Monte Carlo (MC) simulation. It involves the use of data-paths with reduced precision, and the resulting errors are corrected by auxiliary sampling. An analytical model is developed for a reconfigurable accelerator system with a field-programmable gate array (FPGA) and a general purpose processor (GPP). Optimisation based on mixed integer geometric programming is employed for determining the optimal reduced precision and optimal resource allocation among the MC data-paths and correction datapaths. Experiments show that the proposed mixed precision methodology requires up to 11 % additional evaluations while less than 4 % of all the evaluations are computed in the reference precision; the resulting designs are up to 7.1 times faster and 3.1 times more energy efficient than baseline double precision FPGA designs, and up to 163 times faster and 170 times more energy efficient than quad-core software designs optimised with the Intel compiler and Math Kernel Library. Our methodology also produces designs for pricing Asian options which are 4.6 times faster and 5.5 times more energy efficient than NVIDIA Tesla C2070 GPU implementations.

applied reconfigurable computing | 2008

Exploring Reconfigurable Architectures for Binomial-Tree Pricing Models

Qiwei Jin; David B. Thomas; Wayne Luk; Benjamin Cope

This paper explores the application of reconfigurable hardware to the acceleration of financial computations involving binomial-tree pricing models. A parallel pipelined architecture capable of computing multiple binomial trees is presented, which can deal with concurrent requests for option valuations. The architecture is mapped into an xc4vsx55 FPGA. Our results show that an FPGA implementation with fixed-point arithmetic at 87.4MHz can run over 250 times faster than a Core2 Duo processor at 2.2GHz, and more than two times faster than an nVidia Geforce 7900GTX processor with 24 pipelines at 650MHz.

field programmable logic and applications | 2012

Exploiting run-time reconfiguration in stencil computation

Xinyu Niu; Qiwei Jin; Wayne Luk; Qiang Liu; Oliver Pell

Stencil computation is computationally intensive and required by many applications. This paper proposes an approach to exploit run-time reconfigurability of field-programmable accelerators for stencil computation. System throughput is optimized by partitioning, analysing and scheduling tasks in applications to remove idle functions. To evaluate the proposed approach, Reverse Time Migration (RTM), a high performance application, is developed. Our optimized runtime reconfigurable solution, which targets a Virtex-6 FPGA in a Maxeler MAX3424A system, can achieves an improved throughput of 102.8 GFlop/s, up to two orders of magnitude faster than the CPU reference designs, 1.59 times faster than the best published GPU and FPGA results, and 1.45 times faster than an optimized static implementation.

field-programmable logic and applications | 2009

Exploring reconfigurable architectures for explicit finite difference option pricing models

Qiwei Jin; David B. Thomas; Wayne Luk

This paper explores the application of reconfigurable hardware and Graphics Processing Units (GPUs) to the acceleration of financial computation using the finite difference (FD) method. A parallel pipelined architecture has been developed to support concurrent valuation of independent options with high pricing throughput. Our FPGA implementation running at 106MHz on an xc4vlx160 device demonstrates a speed up of 12 times over a Pentium 4 processor at 3.6GHz in single-precision arithmetic; while the FPGA is 3.6 times slower than a Tesla C1060 240-Core GPU at 1.3GHz, it is 9 times more energy efficient.

ACM Transactions on Reconfigurable Technology and Systems | 2009

Exploring Reconfigurable Architectures for Tree-Based Option Pricing Models

Qiwei Jin; David B. Thomas; Wayne Luk; Benjamin Cope

This article explores the application of reconfigurable hardware to the acceleration of financial computation using tree-based pricing models. Two parallel pipelined architectures have been developed for option valuation using binomial trees and trinomial trees, with support for concurrent evaluation of independent options to achieve high pricing throughput. Our results show that the tree-based models executing on a Virtex 4 field programmable gate array (FPGA) at 82.7 MHz with fixed-point arithmetic can run over 160 times faster than a Core2 Duo processor at 2.2 GHz. The FPGA implementation is two times faster than the nVidia Geforce 7900GTX processor with 24 pipelines at 650 MHz, and 27%--35% slower than the nVidia Geforce 8600GTS processor with 32 Pipelines at 1450 MHz. Our preliminary experiments also indicate that while an FPGA implementation can be slower than a GPU, it could be more efficient when power consumption is taken into account.

applied reconfigurable computing | 2012

Optimising performance of quadrature methods with reduced precision

Anson H. T. Tse; Gary C. T. Chow; Qiwei Jin; David B. Thomas; Wayne Luk

This paper presents a generic precision optimisation methodology for quadrature computation targeting reconfigurable hardware to maximise performance at a given error tolerance level. The proposed methodology optimises performance by considering integration grid density versus mantissa size of floating-point operators. The optimisation provides the number of integration points and mantissa size with maximised throughput while meeting given error tolerance requirement. Three case studies show that the proposed reduced precision designs on a Virtex-6 SX475T FPGA are up to 6 times faster than comparable FPGA designs with double precision arithmetic. They are up to 15.1 times faster and 234.9 times more energy efficient than an i7-870 quad-core CPU, and are 1.2 times faster and 42.2 times more energy efficient than a Tesla C2070 GPU.

field-programmable custom computing machines | 2013

Automating Elimination of Idle Functions by Run-Time Reconfiguration

Xinyu Niu; Thomas C. P. Chau; Qiwei Jin; Wayne Luk; Qiang Liu

A design approach is proposed to automatically identify and exploit run-time reconfiguration opportunities while optimising resource utilisation. We introduce Reconfiguration Data Flow Graph, a hierarchical graph structure enabling reconfigurable designs to be synthesised in three steps: function analysis, configuration organisation, and run-time solution generation. Three applications, based on barrier option pricing, particle filter, and reverse time migration are used in evaluating the proposed approach. The run-time solutions approximate the theoretical performance by eliminating idle functions, and are 1.31 to 2.19 times faster than optimised static designs. FPGA designs developed with the proposed approach are up to 28.8 times faster than optimised CPU reference designs and 1.55 times faster than optimised GPU designs.

applied reconfigurable computing | 2012

Multi-level customisation framework for curve based monte carlo financial simulations

Qiwei Jin; Diwei Dong; Anson H. T. Tse; Gary Chun Tak Chow; David B. Thomas; Wayne Luk; Stephen Weston

One of the main challenges when accelerating financial applications using reconfigurable hardware is the management of design complexity. This paper proposes a multi-level customisation framework for automatic generation of complex yet highly efficient curve based financial Monte Carlo simulators on reconfigurable hardware. By identifying multiple levels of functional specialisations and the optimal data format for the Monte Carlo simulation, we allow different levels of programmability in our framework to retain good performance and support multiple applications. Designs targeting a Virtex-6 SX475T FPGA generated by our framework are about 40 times faster than single-core software implementations on an i7-870 quad-core CPU at 2.93 GHz; they are over 10 times faster and 20 times more energy efficient than 4-core implementations on the same i7-870 quad-core CPU, and are over three times more energy efficient and 36% faster than a highly optimised implementation on an NVIDIA Tesla C2070 GPU at 1.15 GHz. In addition, our framework is platform independent and can be extended to support CPU and GPU applications.

field programmable logic and applications | 2012

Optimising explicit finite difference option pricing for dynamic constant reconfiguration

Qiwei Jin; Tobias Becker; Wayne Luk; David B. Thomas

This paper demonstrates a novel optimisation methodology to adjust stencil based numerical procedures from the algorithm level, so as to reduce not only the amount of hardware resource consumption per kernel but also the amount of computation required to achieve desired result accuracy, when mapping the algorithm to reconfigurable hardware using dynamic constant reconfiguration. As a result, less area is needed to support run-time reconfiguration, and less computational steps are required in the numerical procedure to obtain a result with given error tolerance. We analyse one thousand fixed point implementations on a Virtex-6 XC6VLX760 FPGA for randomly generated option pricing problems, which are representative of industrial computation. When comparing optimised implementations to the un-optimised ones, the reconfiguration area upper bound is reduced by 22%; the average number of computational steps is reduced by 23%; and the area-computation-time product is reduced by 40%; while the numerical errors of the results are kept below the error tolerant level used in industry.

reconfigurable computing and fpgas | 2011

Dynamic Constant Reconfiguration for Explicit Finite Difference Option Pricing

Tobias Becker; Qiwei Jin; Wayne Luk; Stephen Weston

This paper explores the reconfiguration of slowly changing constants in an explicit finite difference solver for option pricing. Numerical methods for option pricing, such as finite difference, are computationally very complex and can be aided by hardware acceleration. Such hardware implementations can be further improved by specialising the circuit for constants, and reconfiguring the circuit when the constants change. In this paper we demonstrate how this concept can be applied to the pricing of European and American options. We present an analytical optimisation approach that explores the benefit of specialised designs over a static one. The key to this approach is the performance and area estimation of kernels that is based on the parameters of arithmetic operators inside the kernel. This allows us to quickly explore several design options without building full designs. Our experimental results on a Xilinx XC6VLX760 FPGA show that with a partially reconfigurable design performance can be improved by a factor of 4.7 over a design without reconfiguration.

Explore More