Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Yongfeng Gu is active.

Publication


Featured researches published by Yongfeng Gu.


IEEE Computer | 2007

Achieving High Performance with FPGA-Based Computing

Martin C. Herbordt; Tom VanCourt; Yongfeng Gu; Bharat Sukhwani; Al Conti; Josh Model; Doug Disabello

Numerous application areas, including bioinformatics and computational biology, demand increasing amounts of processing capability. In many cases, the computation cores and data types are suited to field-programmable gate arrays. The challenge is identifying the design techniques that can extract high performance potential from the FPGA fabric


field-programmable custom computing machines | 2006

Single Pass, BLAST-Like, Approximate String Matching on FPGAs

Martin C. Herbordt; Josh Model; Yongfeng Gu; Bharat Sukhwani; Tom VanCourt

Approximate string matching is fundamental to bioinformatics, and has been the subject of numerous FPGA acceleration studies. We address issues with respect to FPGA implementations of both BLAST- and dynamic programming- (DP) based methods. Our primary contributions are two new algorithms for emulating the seeding and extension phases of BLAST. These operate in a single pass through a database at streaming rate (110 Maa/sec on a VP70 for query sizes up to 600 and 170 Maa/sec on a Virtex4 for query sizes up to 1024), and with no preprocessing other than loading the query string. Further, they use very high sensitivity with no slowdown. While current DP-based methods also operate at streaming rate, generating results can be cumbersome. We address this with a new structure for data extraction. We present results from several implementations


field-programmable logic and applications | 2005

Accelerating molecular dynamics simulations with configurable circuits

Yongfeng Gu; Tom VanCourt; Martin C. Herbordt

Molecular dynamics (MD) is of central importance to computational chemistry. Here we show that MD can be implemented efficiently on a COTS FPGA board, and that speed-ups from 31/spl times/ to 88/spl times/ over a PC implementation can be obtained. Although the amount of speed-up depends on the stability required, 46/spl times/ can be obtained with virtually no detriment, and the upper end of the range is apparently viable in many cases. We sketch our FPGA implementations and describe the effects of precision on the trade-off between performance and quality of the MD simulation.


parallel computing | 2007

Single pass streaming BLAST on FPGAs

Martin C. Herbordt; Josh Model; Bharat Sukhwani; Yongfeng Gu; Tom VanCourt

Approximate string matching is fundamental to bioinformatics and has been the subject of numerous FPGA acceleration studies. We address issues with respect to FPGA implementations of both BLAST- and dynamic-programming- (DP) based methods. Our primary contribution is a new algorithm for emulating the seeding and extension phases of BLAST. This operates in a single pass through a database at streaming rate, and with no preprocessing other than loading the query string. Moreover, it emulates parameters turned to maximum possible sensitivity with no slowdown. While current DP-based methods also operate at streaming rate, generating results can be cumbersome. We address this with a new structure for data extraction. We present results from several implementations showing order of magnitude acceleration over serial reference code. A simple extension assures compatibility with NCBI BLAST.


parallel computing | 2008

Explicit design of FPGA-based coprocessors for short-range force computations in molecular dynamics simulations

Yongfeng Gu; Tom VanCourt; Martin C. Herbordt

FPGA-based acceleration of molecular dynamics simulations (MD) has been the subject of several recent studies. The short-range force computation, which dominates the execution time, is the primary focus. Here we combine: a high level of FPGA-specific design including cell lists, systematically determined interpolation and precision, handling of exclusion, and support for MD simulations of up to 256K particles. The target system consists of a standard PC with a 2004-era COTS FPGA board. There are several innovations: new microarchitectures for several major components, including the cell list processor and the off-chip memory controller; and a novel arithmetic mode. Extensive experimentation was required to optimize precision, interpolation order, interpolation mode, table sizes, and simulation quality. We obtain a substantial speed-up over a highly tuned production MD code.


Computing in Science and Engineering | 2008

Computing Models for FPGA-Based Accelerators

Martin C. Herbordt; Yongfeng Gu; Tom VanCourt; Josh Model; Bharat Sukhwani; Matt Chiu

Field-programmable gate arrays are widely considered accelerators for compute-intensive applications. A critical phase of FPGA application development is finding and mapping to the appropriate computing model. These models differ from models generally used in programming. For example, whereas parallel computing models are often based on thread execution and interaction, FPGA computing can exploit more degrees of freedom than are available in software. This enables models with highly flexible fine-grained parallelism and associative operations such as broadcast and collective response. Several case studies demonstrate the effectiveness of using FPGA-based accelerators in molecular modeling.


EURASIP Journal on Advances in Signal Processing | 2006

Rigid molecule docking: FPGA reconfiguration for alternative force laws

Tom VanCourt; Yongfeng Gu; Vikas Mundada; Martin C. Herbordt

Molecular docking is one of the primary computational methods used by pharmaceutical companies to try to reduce the cost of drug discovery. A common docking technique, used for low-resolution screening or as an intermediate step, performs a three-dimensional correlation between two molecules to test for favorable interactions between them. We extend our previous work on FPGA-based docking accelerators, using reconfigurability for customization of the physical laws and geometric models that describe molecule interaction. Our approach, based on direct summation, allows straightforward combination of multiple forces and enables nonlinear force models; the latter, in particular, are incompatible with the transform-based techniques typically used. Our approach has the further advantage of supporting spatially oriented values in molecule models, as well as the detection of multiple positions representing favorable interactions. We report performance measurements on several different models of chemical behavior and show speedups of from to over a PC.


field-programmable custom computing machines | 2007

FPGA-Based Multigrid Computation for Molecular Dynamics Simulations

Yongfeng Gu; Martin C. Herbordt

This paper introduces a parameterisable, application and platform-independent, hybrid memory sub-system for custom hardware. This memory sub-system consists of a scratchpad memory (SPM) and a custom parallel cache, which exploits data re-use effectively in spite of data dependence. The cache is capable of exploiting spatial locality of memory accesses in two dimensions, making it ideal for video applications. Further, we present a case study involving the Quad-tree Structured Pulse Code Modulation (QSDPCM) algorithm, commonly used in MPEG applications. Specifically, the data dependent nature of memory accesses is demonstrated. Using the memory sub-system, performance improvements of up to 1.7times and 1.4times are obtained when the application is implemented on an Altera Stratix 2 chip and a Xilinx Virtex 2 chip respectively, compared to a SPM implementation. In addition, memory savings of up to 3.2times are achieved. These results emphasize the importance of developing dynamic memory sub-systems for custom hardware applications.FPGA-based acceleration of molecular dynamics (MD) has been the subject of several recent studies. Implementing long-range forces, however, has only recently been addressed. Here we describe a solution based on the multigrid method. We show that multigrid is, in general, an excellent match to FPGAs: the primary operations take advantage of the large number of independently addressable RAMs and the efficiency with which complex systolic structures can be implemented. The multigrid accelerator has been integrated into our existing MD system, and an overall performance gain of 5x to 7x has been obtained, depending on hardware configuration and reference code. The simulation accuracy is comparable to the original double precision serial code.


field-programmable logic and applications | 2004

FPGA Acceleration of Rigid Molecule Interactions

Tom Van Court; Yongfeng Gu; Martin C. Herbordt

Modeling of molecule interactions often uses rigid models and correlation techniques, either in early screening passes or as steps within more complex models. Even rigid models are time-consuming when applied to large models at 10 3 - 10 5 different three-axis rotations. This paper presents an FPGA structure for performing the correlations efficiently using a systolic array for 3-D correlation and an addressing technique for low-overhead rotation of a 3-D voxel models around three axes. We find a 200 × speedup in our FPGA implementation compared to the standard transform-based method.


international workshop on computer architecture for machine perception | 2005

Three-dimensional template correlation: object recognition in 3D voxel data

Tom VanCourt; Yongfeng Gu; Martin C. Herbordt

Correlation is a standard technique for recognizing known patterns in two-dimensional grid (pixel) images. Its obvious importance has led to numerous hardware implementations and variations. Images captured directly onto 3D voxel grids are becoming more common, including those from confocal microscopy and medical imaging technologies. To our knowledge, no one has yet addressed correlation as a technique for recognizing 3D templates in such 3D voxel data. We find that this problem includes a number of issues: efficient three-axis rotation of a template with respect to 3D image, large volume of results from the correlation, and the possibility of a template matching an image multiple times. We briefly review techniques that have been used in 2D template matching, and examine analogies to a molecule interaction problem in computational chemistry, including its similarity to multispectral images. We report on a hardware accelerator for the 3D correlation problem, based on a commodity coprocessor board containing field programmable logic arrays (FPGAs). Because the convolution processor is built from reconfigurable logic, it can be adapted to non-linear scoring algorithms using complex data values at each voxel, and can be tailored to solve other problems such as anisotropic grid axes. We present initial performance results for the FPGA implementation, and note that accelerator performance is likely to grow roughly linearly with FPGA capacity, process improvements, and number of FPGAs.

Collaboration


Dive into the Yongfeng Gu's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge