Gregory D. Peterson | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Gregory D. Peterson is active.

Explore More

Publication

Featured researches published by Gregory D. Peterson.

Computational Biology and Chemistry | 2006

The sorting direct method for stochastic simulation of biochemical systems with varying reaction execution behavior

James M. McCollum; Gregory D. Peterson; Chris D. Cox; Michael L. Simpson; Nagiza F. Samatova

A key to advancing the understanding of molecular biology in the post-genomic age is the development of accurate predictive models for genetic regulation, protein interaction, metabolism, and other biochemical processes. To facilitate model development, simulation algorithms must provide an accurate representation of the system, while performing the simulation in a reasonable amount of time. Gillespies stochastic simulation algorithm (SSA) accurately depicts spatially homogeneous models with small populations of chemical species and properly represents noise, but it is often abandoned when modeling larger systems because of its computational complexity. In this work, we examine the performance of different versions of the SSA when applied to several biochemical models. Through our analysis, we discover that transient changes in reaction execution frequencies, which are typical of biochemical models with gene induction and repression, can dramatically affect simulator performance. To account for these shifts, we propose a new algorithm called the sorting direct method that maintains a loosely sorted order of the reactions as the simulation executes. Our measurements show that the sorting direct method performs favorably when compared to other well-known exact stochastic simulation algorithms.

IEEE Transactions on Parallel and Distributed Systems | 2011

Comparing Hardware Accelerators in Scientific Applications: A Case Study

Rick Weber; Akila Gothandaraman; Robert J. Hinde; Gregory D. Peterson

Multicore processors and a variety of accelerators have allowed scientific applications to scale to larger problem sizes. We present a performance, design methodology, platform, and architectural comparison of several application accelerators executing a Quantum Monte Carlo application. We compare the applications performance and programmability on a variety of platforms including CUDA with Nvidia GPUs, Brook+ with ATI graphics accelerators, OpenCL running on both multicore and graphics processors, C++ running on multicore processors, and a VHDL implementation running on a Xilinx FPGA. We show that OpenCL provides application portability between multicore processors and GPUs, but may incur a performance cost. Furthermore, we illustrate that graphics accelerators can make simulations involving large numbers of particles feasible.

The Journal of Urology | 1998

VASECTOMY AND HUMAN IMMUNODEFICIENCY VIRUS TYPE 1 IN SEMEN

John N. Krieger; Apichart Nirapathpongporn; Monthchai Chaiyaporn; Gregory D. Peterson; Irena Nikolaeva; Robert Akridge; Susan O. Ross; Robert W. Coombs

PURPOSE Human immunodeficiency virus type 1 (HIV) is cultured more often from seminal cells than seminal plasma. Because vasectomy causes dramatic reductions in seminal cells and also eliminates secretions from proximal sites in the male reproductive tract, vasectomy may change the potential infectiousness of semen. MATERIALS AND METHODS We used polymerase chain reaction (PCR) assays to measure HIV ribonucleic acid (RNA) in seminal plasma and HIV deoxyribonucleic acid (DNA) in seminal cells from 46 asymptomatic, seropositive men before and after vasectomy. RESULTS HIV RNA levels in semen correlated only weakly with blood levels (r = 0.22, p = 0.03). Of 183 semen specimens assayed for cell-free HIV RNA and proviral DNA 37 (20%) were positive for HIV RNA only, 41 (22%) were positive for HIV DNA only, and 18 (10%) were positive for RNA and DNA. Thus, detection of HIV RNA in seminal plasma was not associated with detection of HIV DNA in seminal cells. HIV RNA was present in 23 of 82 specimens (28%) (mean 2.87 log copies/ml.) before vasectomy and in 38 of 121 specimens (31%) after vasectomy (mean 2.81 log copies/ml.). CONCLUSIONS These findings suggest that direct measurement of HIV levels in semen is necessary to assess the potential for sexual transmission, most cell-free HIV in seminal plasma arises distal to the vas deferens, and vasectomy may have minimal impact on the infectiousness of HIV seropositive men on sexual partners.

field-programmable custom computing machines | 2007

Sparse Matrix-Vector Multiplication Design on FPGAs

Junqing Sun; Gregory D. Peterson; Olaf O Storaasli

Creating a high throughput sparse matrix vector multiplication (SpMxV) implementation depends on a balanced system design. In this paper, we introduce the innovative SpMxV solver designed for FPGAs (SSF). Besides high computational throughput, system performance is optimized by reducing initialization time and overheads, minimizing and overlapping I/O operations, and increasing scalability. SSF accepts any matrix size and can be easily adapted to different data formats. SSF minimizes the control logic by taking advantage of the data flow via an innovative accumulation circuit which uses pipelined floating point adders. Compared to optimized software codes on a Pentium 4 microprocessor, our design achieves up to 20x speedup.Since 1998, no commercially available FPGA has been accompanied by public documentation of its native machine code (or bitstream) format. Consequently, research in reconfigurable hardware has been confined to areas which are specifically supported by manufacturer-supplied tools. Recently, detailed documentation of the bitstream format for the Atmel FPSLIC series of FPGAs appeared on the usenet group comp.arch.fpga. This information has been used to create abits, a Java library for direct manipulation of FPSLIC bitstreams and partial reconfiguration. The abits library is accompanied by the slipway reference design, a low-cost USB bus-powered board carrying an FPSLIC. This paper describes the abits library and slipway platform, as well as a few applications which they make possible. Both the abits source code and slipway board layout are publicly available under the terms of the BSD license. It is our hope that these tools will enable further research in reconfigurable hardware which would not otherwise be possible.

national aerospace and electronics conference | 1996

Developing the next generation cockpit display system

Britton C. Read; D. Barker; R.G. Bishop; L.M. Concha; J.M. Emmert; R.L. Ewing; G.L. Fecher; P. Jarusiewic; Gregory D. Peterson; M. Rubeiz; A.M. Sayson

The goal of advanced cockpit display systems is to present large amounts of information quickly and in an understandable format, enabling the aviator to improve mission performance. Wright Laboratory is developing a program to dramatically improve current display systems. Current front-line cockpit display systems utilize low-resolution analog video to present two dimensional (2-D) images on many separate displays. The future cockpit will be capable of integrating large picture digital video with three dimensional (3-D) and 2-D color images. This system will be capable of rendering icons, maps, and world-views. It will be compatible with head mounted displays and multiple large displays to improve war-planning and combat aviator situational awareness. We are developing a massively parallel 3D renderer which will be capable of updating 500,000 3-D triangles per second with shading, lighting, transparency, texture mapping, and hidden surface removal. The renderer design, based on a University of North Carolina pixel planes design, employs a massively parallel architecture. The rendering system will be small enough to fit on one board, extensible to dual-seat configuration, and capable of up to eight windows per display channel.

IEEE Transactions on Computers | 2008

High-Performance Mixed-Precision Linear Solver for FPGAs

Junqing Sun; Gregory D. Peterson; Olaf O Storaasli

Compared to higher-precision data formats, lower-precision data formats result in higher performance for computational intensive applications on FPGAs because of their lower resource cost, reduced memory bandwidth requirements, and higher circuit frequency. On the other hand, scientific computations usually demand highly accurate solutions. This paper seeks to utilize lower-precision data formats whenever possible for higher performance without losing the accuracy of higher-precision data formats by using mixed-precision algorithms and architectures. First, we analyze the floating-point performance of different data formats on FPGAs. Second, we introduce mixed-precision iterative refinement algorithms for linear solvers and give error analysis. Finally, we propose an innovative architecture for a mixed-precision direct solver for reconfigurable computing. Our results show that our mixed-precision algorithm and architecture significantly improve the performance of linear solvers on FPGAs.

ieee international conference on high performance computing data and analytics | 2012

Power Aware Computing on GPUs

Kiran Kasichayanula; Dan Terpstra; Piotr Luszczek; S. Tomov; Shirley Moore; Gregory D. Peterson

Energy and power density concerns in modern processors have led to significant computer architecture research efforts in power-aware and temperature-aware computing. With power dissipation becoming an increasingly vexing problem, power analysis of Graphical Processing Unit (GPU) and its components has become crucial for hardware and software system design. Here, we describe our technique for a coordinated measurement approach that combines real total power measurement and per-component power estimation. To identify power consumption accurately, we introduce the Activity-based Model for GPUs (AMG), from which we identify activity factors and power for micro architectures on GPUs that will help in analyzing power tradeoffs of one component versus another using micro benchmarks. The key challenge addressed in this work is real-time power consumption, which can be accurately estimated using NVIDIAs Management Library (NVML). We validated our model using Kill-A-Watt power meter and the results are accurate within 10%. This work also analyses energy consumption of MAGMA (Matrix Algebra on GPU and Multicore Architectures) BLAS2, BLAS3 kernels, and Hessenberg kernels.

field-programmable custom computing machines | 2009

An FPGA Implementation for Solving Least Square Problem

Depeng Yang; Gregory D. Peterson; Husheng Li; Junqing Sun

This paper proposes a high performance least square solver on FPGAs using the Cholesky decomposition method. Our design can be realized by iteratively adopting a single triangular linear equation solver for modified Cholesky decomposition and forward/backward substitutions. Good performance is achieved by optimizing the Cholesky factorization algorithms, reordering the computation and thus alleviating the data dependency. Dedicated hardware architecture for solving triangular linear equations is designed and implemented for different precision requirements. Compared to software on a Pentium 4, our design achieves a significant speedup.

parallel computing | 2008

FPGA acceleration of a quantum Monte Carlo application

Akila Gothandaraman; Gregory D. Peterson; G. L. Warren; Robert J. Hinde; Robert J. Harrison

Quantum Monte Carlo methods enable us to determine the ground-state properties of atomic or molecular clusters. Here, we present a reconfigurable computing architecture using Field Programmable Gate Arrays (FPGAs) to accelerate two computationally intensive kernels of a Quantum Monte Carlo (QMC) application applied to N-body systems. We focus on two key kernels of the QMC application: acceleration of potential energy and wave function calculations. We compare the performance of our application on two reconfigurable platforms. Firstly, we use a dual-processor 2.4GHz Intel Xeon augmented with two reconfigurable development boards consisting of Xilinx Virtex-II Pro FPGAs. Using this platform, we achieve a speedup of 3x over a software-only implementation. Following this, the chemistry application is ported to the Cray XD1 supercomputer equipped with Xilinx Virtex-II Pro and Virtex-4 FPGAs. The hardware-accelerated application on one node of the high performance system equipped with a single Virtex-4 FPGA yields a speedup of approximately 25x over the serial reference code running on one node of the dual-processor dual-core 2.2GHz AMD Opteron. This speedup is mainly attributed to the use of pipelining, the use of fixed-point arithmetic for all calculations and the fine-grained parallelism using FPGAs. We can further enhance the performance by operating multiple instances of our design in parallel.

Performance Evaluation | 2005

Parallel application performance on shared high performance reconfigurable computing resources

Melissa C. Smith; Gregory D. Peterson

The use of a network of shared, heterogeneous workstations each harboring a reconfigurable computing (RC) system offers high performance users an inexpensive platform for a wide range of computationally demanding problems. However, effectively using the full potential of these systems can be challenging without the knowledge of the systems performance characteristics. While some performance models exist for shared, heterogeneous workstations, none thus far account for the addition of RC systems. Our analytic performance model includes the effects of the reconfigurable device, application load imbalance, background user load, basic message passing communication, and processor heterogeneity. The methodology proves to be accurate in characterizing these effects for applications running on shared, homogeneous, and heterogeneous HPRC resources. The model error in all cases was found to be less than 5% for application runtimes greater than 30s, and less than 15% for runtimes less than 30s.

Explore More