Joseph M. Lancaster | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Joseph M. Lancaster is active.

Explore More

Publication

Featured researches published by Joseph M. Lancaster.

application-specific systems, architectures, and processors | 2004

Biosequence similarity search on the Mercury system

Praveen Krishnamurthy; Jeremy Buhler; Roger D. Chamberlain; Mark A. Franklin; M. Gyang; Joseph M. Lancaster

Biosequence similarity search is an important application in modern molecular biology. Search algorithms aim to identify sets of sequences whose extensional similarity suggests a common evolutionary origin or function. The most widely used similarity search tool for biosequences is BLAST, a program designed to compare query sequences to a database. Here, we present the design of BLASTN, the version of BLAST that searches DNA sequences, on the Mercury system, an architecture that supports high-volume, high-throughput data movement off a data store and into reconfigurable hardware. An important component of application deployment on the Mercury system is the functional decomposition of the application onto both the reconfigurable hardware and the traditional processor. Both the Mercury BLASTN application design and its performance analysis are described.

ACM Transactions on Reconfigurable Technology and Systems | 2008

Mercury BLASTP: Accelerating Protein Sequence Alignment

Arpith C. Jacob; Joseph M. Lancaster; Jeremy Buhler; Brandon Harris; Roger D. Chamberlain

Large-scale protein sequence comparison is an important but compute-intensive task in molecular biology. BLASTP is the most popular tool for comparative analysis of protein sequences. In recent years, an exponential increase in the size of protein sequence databases has required either exponentially more running time or a cluster of machines to keep pace. To address this problem, we have designed and built a high-performance FPGA-accelerated version of BLASTP, Mercury BLASTP. In this article, we describe the architecture of the portions of the application that are accelerated in the FPGA, and we also describe the integration of these FPGA-accelerated portions with the existing BLASTP software. We have implemented Mercury BLASTP on a commodity workstation with two Xilinx Virtex-II 6000 FPGAs. We show that the new design runs 11--15 times faster than software BLASTP on a modern CPU while delivering close to 99% identical results.

Microprocessors and Microsystems | 2009

Acceleration of ungapped extension in Mercury BLAST

Joseph M. Lancaster; Jeremy Buhler; Roger D. Chamberlain

The amount of biosequence data being produced each year is growing exponentially. Extracting useful information from this massive amount of data efficiently is becoming an increasingly difficult task. There are many available software tools that molecular biologists use for comparing genomic data. This paper focuses on accelerating the most widely used such tool, BLAST. Mercury BLAST takes a streaming approach to the BLAST computation by off loading the performance-critical sections to specialized hardware. This hardware is then used in combination with the processor of the host system to deliver BLAST results in a fraction of the time of the general-purpose processor alone.This paper presents the design of the ungapped extension stage of Mercury BLAST. The architecture of the ungapped extension stage is described along with the context of this stage within the Mercury BLAST system. The design is compact and runs at 100 MHz on available FPGAs, making it an effective and powerful component for accelerating biosequence comparisons. The performance of this stage is 25× that of the standard software distribution, yielding close to 50× performance improvement on the complete BLAST application. The sensitivity is essentially equivalent to that of the standard distribution.

field-programmable custom computing machines | 2007

FPGA-accelerated seed generation in Mercury BLASTP

Arpith C. Jacob; Joseph M. Lancaster; Jeremy Buhler; Roger D. Chamberlain

BLASTP is the most popular tool for comparative analysis of protein sequences. In recent years, an exponential increase in the size of protein sequence databases has required either exponentially more runtime or a cluster of machines to keep pace. To address this problem, we have designed and built a high-performance FPGA-accelerated version of BLASTP, Mercury BLASTP. In this paper, we focus on seed generation, the first stage of the BLASTP algorithm. Our seed generator is capable of processing database residues at up to 219 Mresidues/second for 2048- residue queries. The full Mercury BLASTP pipeline, including our seed generator, achieves a speedup of 37times over the popular NCBI BLASTP software on a 2.8 GHz Intel P4 CPU, with sensitivity more than 99% that of the software. Our architecture can be generalized to accelerate the seed generation stage in other important biocomputing applications.A technique is presented which allows an FPGA-based reconfigurable system-on-chip to automatically and dynamically load hardware peripheral controllers and software device drivers depending on the systems automated identification of peripheral boards which are connected to the FPGA. The technique loads peripheral detection modules into peripheral controller slots at system startup, and after these modules identify the peripheral, the correct hardware controllers and software drivers are loaded.

field-programmable logic and applications | 2007

A Banded Smith-Waterman FPGA Accelerator for Mercury BLASTP

Brandon Harris; Arpith C. Jacob; Joseph M. Lancaster; Jeremy Buhler; Roger D. Chamberlain

Large-scale protein sequence comparison is an important but compute-intensive task in molecular biology. The popular BLASTP software for this task has become a bottleneck for proteomic database search. One third of this softwares time is spent executing the Smith-Waterman dynamic programming algorithm. This work describes a novel FPGA design for banded Smith-Waterman, an algorithmic variant tuned to the needs of BLASTP. This design has been implemented in Mercury BLASTP, our FPGA-accelerated version of the BLASTP algorithm. We show that Mercury BLASTP runs 6-16 times faster than software BLASTP on a modern CPU while delivering 99% identical results.

parallel computing | 2008

Visions for application development on hybrid computing systems

Roger D. Chamberlain; Joseph M. Lancaster; Ron K. Cytron

Hybrid computing systems (incorporating FPGAs, GPUs, etc.) have received considerable attention recently as an approach to significant performance gains in many problem domains. Deploying applications on these systems, however, has proven to be difficult and very labor intensive. In this paper we review the current state of practice for application development on hybrid systems. We also present our vision of the application development languages and tools that we believe would greatly benefit the process of designing, implementing, and deploying applications on hybrid systems.

international parallel and distributed processing symposium | 2007

Preliminary results in accelerating profile HMM search on FPGAs

Arpith C. Jacob; Joseph M. Lancaster; Jeremy Buhler; Roger D. Chamberlain

Comparison between biosequences and probabilistic models is an increasingly important part of modern DNA and protein sequence analysis. The large and growing number of such models in todays databases demands computational approaches to searching these databases faster, while maintaining high sensitivity to biologically meaningful similarities. This work describes an FPGA-based accelerator for comparing proteins to hidden Markov models of the type used to represent protein motifs in the popular HM-MER motif finder. Our engine combines a systolic array design with enhancements to pipeline the complex Viterbi calculation that forms the core of the comparison, and to support coarse-grained parallelism and streaming of multiple sequences within one FPGA. Performance estimates based on a functioning VHDL realisation of our design show a 190 times speedup over the same computation in optimised software on a modern general-purpose CPU.

application specific systems architectures and processors | 2010

Deadlock-avoidance for streaming applications with split-join structure: Two case studies

Peng Li; Kunal Agrawal; Jeremy Buhler; Roger D. Chamberlain; Joseph M. Lancaster

Streaming is a highly effective paradigm for expressing parallelism in high-throughput applications. A streaming computation is a network of compute nodes connected by unidirectional FIFO channels. When these computations are mapped onto real parallel platforms, however, some computations, especially ones in which some nodes act as filters, can deadlock the system due to finite buffering on channels. In this paper, we focus on streaming computations which contain a commonly used structure called split-join. Based on our previous work, we propose two correct deadlock-avoidance algorithms, named the Propagating Algorithm and the Non-propagating Algorithm. Our evaluation of two representative applications, biological sequence alignment and random number generation, shows that the Non-propagating Algorithm has very small communication overhead. For systems with large buffers or a low filtering ratio, the communication overhead of the Non-propagating Algorithm is negligible.

symposium on cloud computing | 2009

Efficient runtime performance monitoring of FPGA-based applications

Joseph M. Lancaster; Jeremy Buhler; Roger D. Chamberlain

Embedded computing platforms have long incorporated non-traditional architectures (e.g., FPGAs, ASICs) to combat the diminishing returns of Moores Law as applied to traditional processors. These specialized architectures can offer higher performance potential in a smaller space, higher power efficiency, and competitive costs. A price is paid, however, in development difficulty in determining functional correctness and understanding the performance of such a system. In this paper we focus on improving the task of performance debugging streaming applications deployed on FPGAs. We describe our runtime performance monitoring infrastructure, its capabilities and overheads on several different configurations of the monitor. We then employ the monitoring system to study the performance effects of provisioning resources for Mercury BLASTN, an implementation of the BLASTN sequence comparison application on an FPGA-accelerated system.

application specific systems architectures and processors | 2011

TimeTrial: A low-impact performance profiler for streaming data applications

Joseph M. Lancaster; E.F. Berkley Shands; Jeremy Buhler; Roger D. Chamberlain

Finding performance bottlenecks in application-specific systems is becoming increasingly labor-intensive as the capabilities of these systems improve. The complex platforms required to meet todays high application performance demands put pressure on developers to sustain current design cycles. Application developers need better tools to diagnose performance issues that arise when utilizing real-world application-specific platforms, from embedded applications to high-performance computational science applications. In this paper, we present TimeTrial, a runtime performance monitoring system that profiles streaming data applications deployed on architecturally diverse computers. TimeTrial is designed to operate with minimal impact on the executing application, exploiting user directives to aggressively perform lossy compression on performance meta-data. We present the design of the TimeTrial performance monitor and demonstrate its use in discovering performance bottlenecks in a production computational biology application.

Explore More