Jeffrey Hammes
Colorado State University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jeffrey Hammes.
IEEE Computer | 2003
Walid A. Najjar; Willem A. P. Bohm; Bruce A. Draper; Jeffrey Hammes; Robert G. Rinker; J.R. Beveridge; Monica Chawathe; Charlie Ross
RC systems typically consist of an array of configurable computing elements. The computational granularity of these elements ranges from simple gates - as abstracted by FPGA lookup tables - to complete arithmetic-logic units with or without registers. A rich programmable interconnect completes the array. RC system developer manually partitions an application into two segments: a hardware component in a hardware description language such as VHDL or Verilog that will execute as a circuit on the FPGA and a software component that will execute as a program on the host. Single-assignment C is a C language variant designed to create an automated compilation path from an algorithmic programming language to an FPGA-based reconfigurable computing system.
The Journal of Supercomputing | 2002
A. P. Wim Böhm; Jeffrey Hammes; Bruce A. Draper; Monica Chawathe; Charlie Ross; Robert Rinker; Walid A. Najjar
This paper presents the high level, machine independent, algorithmic, single-assignment programming language SA-C and its optimizing compiler targeting reconfigurable systems. SA-C is intended for Image Processing applications. Language features are introduced and discussed. The intermediate forms DDCF, DFG and AHA, used in the optimization and code-generation phases, are described. Conventional and reconfigurable system specific optimizations are introduced. The code generation process is described. The performance for these systems is analyzed, using a range of applications from simple Image Processing Library functions to more comprehensive applications, such as the ARAGTAP target acquisition prescreener.
IEEE Transactions on Very Large Scale Integration Systems | 2001
Robert Rinker; Margaret Carter; Amitkumar Patel; Monica Chawathe; Charlie Ross; Jeffrey Hammes; Walid A. Najjar; Wim Bohm
We describe a system, developed as part of the Cameron project, which compiles programs written in a single-assignment subset of C called SA-C into dataflow graphs and then into VHDL. The primary application domain is image processing. The system consists of an optimizing compiler which produces dataflow graphs and a dataflow graph to VHDL translator. The method used for the translation is described here, along with some results on an application. The objective is not to produce yet another design entry tool, but rather to shift the programming paradigm from HDLs to an algorithmic level, thereby extending the realm of hardware design to the application programmer.
ACM Transactions in Embedded Computing Systems | 2003
Girish Venkataramani; Walid A. Najjar; Fadi J. Kurdahi; Nader Bagherzadeh; A. P. Wim Böhm; Jeffrey Hammes
The rapid growth of device densities on silicon has made it feasible to deploy reconfigurable hardware as a highly parallel computing platform. However, one of the obstacles to the wider acceptance of this technology is its programmability. The application needs to be programmed in hardware description languages or an assembly equivalent, whereas most application programmers are used to the algorithmic programming paradigm. SA-C has been proposed as an expression-oriented language designed to implicitly express data parallel operations. The Morphosys project proposes an SoC architecture consisting of reconfigurable hardware that supports a data-parallel, SIMD computational model. This paper describes a compiler framework to analyze SA-C programs, perform optimizations, and automatically map the application onto the Morphosys architecture. The mapping process is static and it involves operation scheduling, processor allocation and binding, and register allocation in the context of the Morphosys architecture. The compiler also handles issues concerning data streaming and caching in order to minimize data transfer overhead. We have compiled some important image-processing kernels, and the generated schedules reflect an average speedup in execution times of up to 6× compared to the execution on 800 MHz Pentium III machines.
international conference on computer vision systems | 1999
Jeffrey Hammes; Bruce A. Draper; A. P. Willem Böhm
This paper presents Sassy, a single-assignment variant of the C programming language developed in concert with Khoral Inc. and designed to exploit both coarse-grain and fine-grain parallelism in image processing applications. Sassy programs are written in the Khoros software development environment, and can be manipulated inside Cantata (the Khoros GUI). The Sassy language supports image processing with true multidimensional arrays, sophisticated array access and windowing mechanisms, and built-in reduction operators (e.g. histogram). At the same time, Sassy restricts C so as to enable compiler optimizations for parallel execution environments, with the goal of reducing data traffic, code size and execution time. In particular, the Sassy language and its optimizing compiler target re-configurable systems, which are fine-grain parallel processors. Reconfigurable systems consist of field-programmable gate arrays (FPGAs), memories and interconnection hardware, and can be used as inexpensive co-processors with conventional workstations or PCs. The compiler optimizations needed to generate highly optimal host, FPGA, and communication code, are discussed. The massive parallelism and high throughput of reconfigurable systems makes them well-suited to image processing tasks, but they have not previously been used in this context because they are typically programmed in hardware description languages such as VHDL. Sassy was developed as part of the Cameron project, with the goal of elevating the programming level for reconfigurable systems from hardware circuits to programming language.
international parallel and distributed processing symposium | 2001
Jeffrey Hammes; A.P.W. Bohm; Charlie Ross; Monica Chawathe; Bruce A. Draper; Robert G. Rinker; Walid A. Najjar
This paper describes a system for compiling codes written in a conventional high-level language to recon gurable FPGA-based hardware. SA-C is a singleassignment language, and its compiler performs a variety of optimizations, some conventional and some specialized, before generating data ow graphs. The data ow graphs are then compiled to VHDL. Three novel compiler optimizations are described here, all based on loops with windowing behavior. Loop fusion with windows requires a unique approach to producer-consumer fusion. Temporal Common Subexpression Elimination identi es expressions that recompute values that were computed in previous iterations, and replaces them with registers. Window narrowing reduces window sizes after other optimizations have been applied. Finally, the performance e ects of these optimizations on a four-loop sequence
Journal of Functional Programming | 1995
Jeffrey Hammes; Olaf M. Lubeck; A. P. Wim Böhm
In this paper we present functional Id and Haskell versions of a large Monte Carlo radiation transport code, and compare the two languages with respect to their expressiveness. Monte Carlo transport simulation exercises such abilities as parsing, input/output, recur-sive data structures and traditional number crunching, which makes it a good test problem for languages and compilers. Using some code examples, we compare the programming styles encouraged by the two languages. In particular, we discuss the eeect of laziness on programming style. We point out that resource management problems currently prevent running realistically large problem sizes in the functional versions of the code. The Monte Carlo technique has a long history. Its importance has grown in tandem with the availability of cheap computing power. The authors outline the functionality of a large Monte Carlo simulation program, and demonstrate that a simpliied kernel version can be cleanly coded in a functional style. They illustrate some eeects of functional language implementation on programming style. It is characteristic of the Monte Carlo method that code validation and debugging depend on high-statistics results. The authors frankly describe the problems encountered in obtaining such results from the functional codes. Their experiences highlight the need for future research to address speciic implementation problems. Chief among these needs are eeective debugging tools for inspecting partial results and eecient yet unobtrusive methods of memory management. Reports of this kind provide important empirical data on the practice of functional programming that can help guide both application development and language support research.
application-specific systems, architectures, and processors | 2000
Robert Rinker; Jeffrey Hammes; Walid A. Najjar; A. P. Wim Böhm; Bruce A. Draper
This paper describes the compilation of high-level language programs written in a single-assignment language called SA-C into the binary codes used for programming reconfigurable hardware. The primary application domain is image processing. The paper describes the SA-C language, the compiler and the optimizations it performs, the process of converting the intermediate form called dataflow graphs into VHDL, and the generation of hardware configuration codes. Performance data on a typical image processing program, written in SA-C and executed on a reconfigurable computing system, is presented and compared to a hand-written VHDL version and a C version running on conventional processors.
Journal of Functional Programming | 1997
Jeffrey Hammes; Sumit Sur; A. P. Wim Böhm
In this paper we investigate the effectiveness of functional language features when writing scientific codes. Our programs are written in the purely functional subset of Id and executed on a one node Motorola Monsoon machine, and in Haskell and executed on a Sparc 2. In the application we study – the NAS FT benchmark, a three-dimensional heat equation solver – it is necessary to target and select one-dimensional sub-arrays in three-dimensional arrays. Furthermore, it is important to be able to share computation in array definitions. We compare first order and higher order implementations of this benchmark. The higher order version uses functions to select one-dimensional sub-arrays, or slices, from a three-dimensional object, whereas the first order version creates copies to achieve the same result. We compare various representations of a three-dimensional object, and study the effect of strictness in Haskell. We also study the performance of our codes when employing recursive and iterative implementations of the one-dimensional FFT, which forms the kernel of this benchmark. It turns out that these languages still have quite inefficient implementations, with respect to both space and time. For the largest problem we could run (323), Haskell is 15 times slower than Fortran and uses three times more space than is absolutely necessary, whereas Id on Monsoon uses nine times more cycles than Fortran on the MIPS R3000, and uses five times more space than is absolutely necessary. This code, and others like it, should inspire compiler writers to improve the performance of functional language implementations.
field-programmable custom computing machines | 2001
A.P.W. Bohm; Bruce A. Draper; Walid A. Najjar; Jeffrey Hammes; Robert G. Rinker; Monica Chawathe; Charlie Ross
This paper describes a system for one-step compilation of image processing (IP) codes, written in the machine-independent, algorithmic, high-level single assignment language SA-C, to FPGA-based hardware. The SA - C compiler performs a variety of optimizations, some conventional and some specialized, before generating dataflow graphs and host code. The dataflow graphs are then compiled, via VHDL, to FPGA configuration codes. This paper introduces SA - C and describes the optimization and code generation stages in the compiler. The performance of a target acquisition prescreener (ARAGTAP), the Intel Image Processing Library, and an iterative tri-diagonal solver running on a reconfigurable system are compared to their performance on a Ppentium PC with MMX.