Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Martin C. Herbordt is active.

Publication


Featured researches published by Martin C. Herbordt.


IEEE Computer | 2007

Achieving High Performance with FPGA-Based Computing

Martin C. Herbordt; Tom VanCourt; Yongfeng Gu; Bharat Sukhwani; Al Conti; Josh Model; Doug Disabello

Numerous application areas, including bioinformatics and computational biology, demand increasing amounts of processing capability. In many cases, the computation cores and data types are suited to field-programmable gate arrays. The challenge is identifying the design techniques that can extract high performance potential from the FPGA fabric


field-programmable custom computing machines | 2006

Single Pass, BLAST-Like, Approximate String Matching on FPGAs

Martin C. Herbordt; Josh Model; Yongfeng Gu; Bharat Sukhwani; Tom VanCourt

Approximate string matching is fundamental to bioinformatics, and has been the subject of numerous FPGA acceleration studies. We address issues with respect to FPGA implementations of both BLAST- and dynamic programming- (DP) based methods. Our primary contributions are two new algorithms for emulating the seeding and extension phases of BLAST. These operate in a single pass through a database at streaming rate (110 Maa/sec on a VP70 for query sizes up to 600 and 170 Maa/sec on a Virtex4 for query sizes up to 1024), and with no preprocessing other than loading the query string. Further, they use very high sensitivity with no slowdown. While current DP-based methods also operate at streaming rate, generating results can be cumbersome. We address this with a new structure for data extraction. We present results from several implementations


field-programmable logic and applications | 2005

Accelerating molecular dynamics simulations with configurable circuits

Yongfeng Gu; Tom VanCourt; Martin C. Herbordt

Molecular dynamics (MD) is of central importance to computational chemistry. Here we show that MD can be implemented efficiently on a COTS FPGA board, and that speed-ups from 31/spl times/ to 88/spl times/ over a PC implementation can be obtained. Although the amount of speed-up depends on the stability required, 46/spl times/ can be obtained with virtually no detriment, and the upper end of the range is apparently viable in many cases. We sketch our FPGA implementations and describe the effects of precision on the trade-off between performance and quality of the MD simulation.


parallel computing | 2007

Single pass streaming BLAST on FPGAs

Martin C. Herbordt; Josh Model; Bharat Sukhwani; Yongfeng Gu; Tom VanCourt

Approximate string matching is fundamental to bioinformatics and has been the subject of numerous FPGA acceleration studies. We address issues with respect to FPGA implementations of both BLAST- and dynamic-programming- (DP) based methods. Our primary contribution is a new algorithm for emulating the seeding and extension phases of BLAST. This operates in a single pass through a database at streaming rate, and with no preprocessing other than loading the query string. Moreover, it emulates parameters turned to maximum possible sensitivity with no slowdown. While current DP-based methods also operate at streaming rate, generating results can be cumbersome. We address this with a new structure for data extraction. We present results from several implementations showing order of magnitude acceleration over serial reference code. A simple extension assures compatibility with NCBI BLAST.


Journal of Parallel and Distributed Computing | 2000

A System for Evaluating Performance and Cost of SIMD Array Designs

Martin C. Herbordt; Jade Cravy; Renoy Sam; Owais Kidwai; Calvin Lin

SIMD arrays are likely to become increasingly important as coprocessors in domain specific systems as architects continue to leverage RAM technology in their design. The problem this work addresses is the efficient evaluation of SIMD arrays with respect to complex applications while accounting for operating frequency and chip area. The underlying issues include the size of the architecture space, the lack of portability of the test programs, and the inherent complexity of simulating up to hundreds of thousands of processing elements. The overall method we use is to combine architecture level and Electronic Design Automation (EDA) level modeling by using an EDA-based tool to calibrate architectural simulations. The resulting system retains much of the high throughput of the architecture level simulator but it also has accuracy similar to that of an early pass EDA synthesis and circuit simulation. The particular problem of computational cost of the architectural level simulation is addressed with a novel approach to trace-based simulation (we call it trace compilation), which we find to be one to two orders of magnitude faster than instruction level simulation while still retaining much of the accuracy of the model. Furthermore, traces must be generated for only a small fraction of the possible parameter combinations. Using trace compilation also addresses program portability by allowing the user to code in a single data parallel language with a single compiler, regardless of the target architecture. We have used our system to evaluate thousands of potential SIMD array designs with respect to real applications and present some sample results.


application-specific systems, architectures, and processors | 2004

Families of FPGA-based algorithms for approximate string matching

T. Van Court; Martin C. Herbordt

Dynamic programming for approximate string matching is a large family of different algorithms, which vary significantly in purpose, complexity, and hardware utilization. Many implementations have reported impressive speed-ups, but have typically been point solutions -highly specialized and addressing only one or a few of the many possible options. The problem to be solved is creating a hardware description that implements a broad range of behavioral options without losing efficiency due to feature bloat. We report a set of three component types that address different parts of the DP string matching problem. Multiple, interchangeable implementations are available for each component type. This allows each application to choose the feature set required, then make maximum use of the FPGA fabric according to that applications specific resource requirements. Synthesis estimates show a 4:1 improvement in time-space performance, depending on the options chosen for a specific matching task.


ACM Transactions on Reconfigurable Technology and Systems | 2010

Molecular Dynamics Simulations on High-Performance Reconfigurable Computing Systems

Matt Chiu; Martin C. Herbordt

The acceleration of molecular dynamics (MD) simulations using high performance reconfigurable computing (HPRC) has been much studied. Given the intense competition from multicore, and from other types of accelerators, there is now a question whether MD on HPRC can be competitive. We concentrate here on the MD kernel computation-determining the force between short-range particle pairs-and examine it in detail to find the performance limits under current technology and methods. We systematically explore the design space of the force pipeline with respect to arithmetic algorithm, arithmetic mode, precision, and various other optimizations. We examine simplifications that are possible if the end-user is willing to trade off simulation quality for performance. And we use the new Altera floating point cores and compiler to further optimize the designs. We find that for the Stratix-III, and for the best (as yet unoptimized) single precision designs, 11 pipelines running at 250 MHz can fit on the FPGA. If a significant fraction of this potential performance can be maintained in a full implementation, then HPRC MD should be highly competitive.


parallel computing | 2008

Explicit design of FPGA-based coprocessors for short-range force computations in molecular dynamics simulations

Yongfeng Gu; Tom VanCourt; Martin C. Herbordt

FPGA-based acceleration of molecular dynamics simulations (MD) has been the subject of several recent studies. The short-range force computation, which dominates the execution time, is the primary focus. Here we combine: a high level of FPGA-specific design including cell lists, systematically determined interpolation and precision, handling of exclusion, and support for MD simulations of up to 256K particles. The target system consists of a standard PC with a 2004-era COTS FPGA board. There are several innovations: new microarchitectures for several major components, including the cell list processor and the off-chip memory controller; and a novel arithmetic mode. Extensive experimentation was required to optimize precision, interpolation order, interpolation mode, table sizes, and simulation quality. We obtain a substantial speed-up over a highly tuned production MD code.


international conference on supercomputing | 2010

Fast and accurate NCBI BLASTP: acceleration with multiphase FPGA-based prefiltering

Atabak Mahram; Martin C. Herbordt

NCBI BLAST has become the de facto standard in bioinformatic approximate string matching and so its acceleration is of fundamental importance. The problem is that it uses complex heuristics which make it difficult to simultaneously achieve both substantial speed-up and exact agreement with the original output. We have previously described how a novel FPGA-based prefilter that performs exhaustive ungapped alignment (EUA) could be used to reduce the computation by over 99.9% without loss of sensitivity. The primary contribution here is to show how the EUA filter can be combined with another filter, this one based on standard 2-hit seeding. The result is a doubling of performance over the previous best implementation, which itself is an order of magnitude faster than the unaccelerated original. Other contributions include new algorithms for both the original EUA and the 2-hit filters and experimental results demonstrating their utility. This new multiphase FPGA-accelerated NCBI BLASTP scales easily and is appropriate for use in large FPGA-based servers such as the Novo-G.


Computing in Science and Engineering | 2008

Computing Models for FPGA-Based Accelerators

Martin C. Herbordt; Yongfeng Gu; Tom VanCourt; Josh Model; Bharat Sukhwani; Matt Chiu

Field-programmable gate arrays are widely considered accelerators for compute-intensive applications. A critical phase of FPGA application development is finding and mapping to the appropriate computing model. These models differ from models generally used in programming. For example, whereas parallel computing models are often based on thread execution and interaction, FPGA computing can exploit more degrees of freedom than are available in software. This enables models with highly flexible fine-grained parallelism and associative operations such as broadcast and collective response. Several case studies demonstrate the effectiveness of using FPGA-based accelerators in molecular modeling.

Collaboration


Dive into the Martin C. Herbordt's collaboration.

Top Co-Authors

Avatar

Charles C. Weems

University of Massachusetts Amherst

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge