Jason D. Bakos | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jason D. Bakos is active.

Explore More

Publication

Featured researches published by Jason D. Bakos.

Computing in Science and Engineering | 2010

High-Performance Heterogeneous Computing with the Convey HC-1

Jason D. Bakos

Unlike other socket-based reconfigurable coprocessors, the Convey HC-1 contains nearly 40 field-programmable gate arrays, scatter-gather memory modules, a high-capacity crossbar switch, and a fully coherent memory system.

field-programmable custom computing machines | 2011

A Sparse Matrix Personality for the Convey HC-1

Krishna K. Nagar; Jason D. Bakos

In this paper we describe a double precision floating point sparse matrix-vector multiplier (SpMV) and its performance as implemented on a Convey HC-1 reconfigurable computer. The primary contributions of this work are a novel streaming reduction architecture for floating point accumulation, a novel on-chip cache optimized for streaming compressed sparse row (CSR) matrices, and end-to-end integration with the HC-1s system, programming model, and runtime environment. The design is composed of 32 parallel processing elements, each connected to the HC-1s coprocessor memory and each containing a streaming multiply-accumulator and local vector cache. When used on the HC-1, each PE has a peak throughput of 300 double precision MFLOP/s, giving a total peak throughput of 9.6 GFLOPS/s. For our test matrices, we demonstrate up to 40% of the peak performance and compare these results with results obtained using the CUSparse library on an NVIDIA Tesla S1070 GPU. In most cases our implementation exceeds the performance of the GPU.

The Journal of Supercomputing | 2013

Accelerating frequent itemset mining on graphics processing units

Fan Zhang; Yan Zhang; Jason D. Bakos

In this paper we describe a new parallel Frequent Itemset Mining algorithm called “Frontier Expansion.” This implementation is optimized to achieve high performance on a heterogeneous platform consisting of a shared memory multiprocessor and multiple Graphics Processing Unit (GPU) coprocessors. Frontier Expansion is an improved data-parallel algorithm derived from the Equivalent Class Clustering (Eclat) method, in which a partial breadth-first search is utilized to exploit maximum parallelism while being constrained by the available memory capacity. In our approach, the vertical transaction lists are represented using a “bitset” representation and operated using wide bitwise operations across multiple threads on a GPU. We evaluate our approach using four NVIDIA Tesla GPUs and observed a 6–30× speedup relative to state-of-the-art sequential Eclat and FPGrowth implementations executed on a multicore CPU.

field-programmable technology | 2009

FPGA vs. GPU for sparse matrix vector multiply

Yan Zhang; Yasser H. Shalabi; Rishabh Jain; Krishna K. Nagar; Jason D. Bakos

Sparse matrix-vector multiplication (SpMV) is a common operation in numerical linear algebra and is the computational kernel of many scientific applications. It is one of the original and perhaps most studied targets for FPGA acceleration. Despite this, GPUs, which have only recently gained both general-purpose programmability and native support for double precision floating-point arithmetic, are viewed by some as a more effective platform for SpMV and similar linear algebra computations. In this paper, we present an analysis comparing an existing GPU SpMV implementation to our own, novel FPGA implementation. In this analysis, we describe the challenges faced by any SpMV implementation, the unique approaches to these challenges taken by both FPGA and GPU implementations, and their relative performance for SpMV.

BMC Bioinformatics | 2010

FPGA acceleration of the phylogenetic likelihood function for Bayesian MCMC inference methods

Stephanie Zierke; Jason D. Bakos

BackgroundLikelihood (ML)-based phylogenetic inference has become a popular method for estimating the evolutionary relationships among species based on genomic sequence data. This method is used in applications such as RAxML, GARLI, MrBayes, PAML, and PAUP. The Phylogenetic Likelihood Function (PLF) is an important kernel computation for this method. The PLF consists of a loop with no conditional behavior or dependencies between iterations. As such it contains a high potential for exploiting parallelism using micro-architectural techniques. In this paper, we describe a technique for mapping the PLF and supporting logic onto a Field Programmable Gate Array (FPGA)-based co-processor. By leveraging the FPGAs on-chip DSP modules and the high-bandwidth local memory attached to the FPGA, the resultant co-processor can accelerate ML-based methods and outperform state-of-the-art multi-core processors.ResultsWe use the MrBayes 3 tool as a framework for designing our co-processor. For large datasets, we estimate that our accelerated MrBayes, if run on a current-generation FPGA, achieves a 10× speedup relative to software running on a state-of-the-art server-class microprocessor. The FPGA-based implementation achieves its performance by deeply pipelining the likelihood computations, performing multiple floating-point operations in parallel, and through a natural log approximation that is chosen specifically to leverage a deeply pipelined custom architecture.ConclusionsHeterogeneous computing, which combines general-purpose processors with special-purpose co-processors such as FPGAs and GPUs, is a promising approach for high-performance phylogeny inference as shown by the growing body of literature in this field. FPGAs in particular are well-suited for this task because of their low power consumption as compared to many-core processors and Graphics Processor Units (GPUs) [1].

ACM Transactions on Reconfigurable Technology and Systems | 2013

An FPGA-Based Accelerator for Frequent Itemset Mining

Yan Zhang; Fan Zhang; Zheming Jin; Jason D. Bakos

In this article we describe a Field Programmable Gate Array (FPGA)-based coprocessor architecture for Frequent Itemset Mining (FIM). FIM is a common data mining task used to find frequently occurring subsets amongst a database of sets. FIM is a nonnumerical, data intensive computation and is used in machine learning and computational biology. FIM is particularly expensive---in terms of execution time and memory---when performed on large and/or sparse databases or when applied using a low appearance frequency threshold. Because of this, the development of increasingly efficient FIM algorithms and their mapping to parallel architectures is an active field. Previous attempts to accelerate FIM using FPGAs have relied on performance-limiting strategies such as iterative database loading and runtime logic unit reconfiguration. In this article, we present a novel architecture to implement Eclat, a well-known FIM algorithm. Unlike previous efforts, our technique does not impose limits on the maximum set size as a function of available FPGA logic resources and our design scales well to multiple FPGAs. In addition to a novel hardware design, we also present a corresponding compression scheme for intermediate results that are stored in on-chip memory. On a four-FPGA board, experimental results show up to 68X speedup compared to a highly optimized software implementation.

IEEE Transactions on Industry Applications | 2010

Integrated Circuit Implementation for a GaN HFET Driver Circuit

Bo Wang; Marco Riva; Jason D. Bakos; Antonello Monti

The paper presents the design of an integrated circuit (IC) for a 10MHz low power-loss driver for GaN HFETs. While the main elements of the topology were introduced in a previous work, here the authors focus on the design of the IC and present preliminary results and considerations. The driver circuit proposed, based upon new two-stage positive-to-negative level shifters and resonant topology, has been designed and implemented using the cost-effective Smart Voltage extension (SVX) technique. Detailed analysis of the design process as well as a full set of simulations, reported in the paper, fully demonstrate the possibility to exploit the advantages of GaN devices by means of a smart and convenient implementation.

field-programmable custom computing machines | 2007

FPGA Acceleration of Gene Rearrangement Analysis

Jason D. Bakos

In this paper we present our work toward FPGA acceleration of phylogenetic reconstruction, a type of analysis that is commonly performed in the fields of systematic biology and comparative genomics. In our initial study, we have targeted a specific application that reconstructs maximum-parsimony (MP) phylogenies for gene-rearrangement data. Like other prevalent applications in computational biology, this application relies on a control-dependent, memory-intensive, and non-arithmetic combinatorial optimization algorithm. To achieve hardware acceleration, we developed an FPGA core design that implements the applications primary bottleneck computation. Because our core is lightweight, we are able to synthesize multiple cores on a single FPGA. By using several cores in parallel, we have achieved a 25X end-to-end application speedup using simulated input data.

IEEE Transactions on Very Large Scale Integration Systems | 2008

A Special-Purpose Architecture for Solving the Breakpoint Median Problem

Jason D. Bakos; Panormitis E. Elenis

In this paper, we describe the design for a co-processor for whole-genome phylogenetic reconstruction. Our current design performs a parallelized breakpoint median computation, which is an expensive component of the overall application. When implemented on a field-programmable gate array (FPGA), our hardware breakpoint median achieves a maximum speedup of 1005times over software. When the coprocessor is used to accelerate the entire reconstruction procedure, we achieve a maximum application speedup of 417times. The results in this paper suggest that FPGA-based acceleration is a promising approach for computationally expensive phylogenetic problems, in spite of the fact that the involved algorithms are based on complex, control-dependent combinatorial optimization.

field-programmable custom computing machines | 2006

A Reconfigurable Distributed Computing Fabric Exploiting Multilevel Parallelism

Charles L. Cathey; Jason D. Bakos; Duncan A. Buell

This paper presents a novel reconfigurable data flow processing architecture that promises high performance by explicitly targeting both fine- and course-grained parallelism. This architecture is based on multiple FPGAs organized in a scalable direct network that is substantially more interconnect-efficient than currently used crossbar technology. In addition, we discuss several ancillary issues and propose solutions required to support this architecture and achieve maximal performance for general-purpose applications; these include supporting IP, mapping techniques, and routing policies that enable greater flexibility for architectural evolution and code portability

Explore More