Ron Sass
Clemson University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ron Sass.
foundations of computer science | 2001
Keith D. Underwood; Ron Sass; Walter B. Ligon
With a focus on commodity PC systems, Beowulf clusters traditionally lack the cutting edge network architectures, memory subsystems, and processor technologies found in their more expensive supercomputer counterparts. What Beowulf clusters lack in technology, they more than make up for with their significant cost advantage over traditional supercomputers. We propose an architectural extension that adds reconfigurable computing to the network interface of Beowulf clusters. This enhances both the network and processor capabilities of the cluster.Furthermore, for some applications, the proposed extension partially compensates for weaknesses in the PC memory subsystem.We discuss two applications, the 2D Fast Fourier Transform (FFT) and integer sorting, which benefit from the resulting architecture.
Concurrency and Computation: Practice and Experience | 2003
Keith D. Underwood; Walter B. Ligon; Ron Sass
With a focus on commodity PC systems, Beowulf clusters traditionally lack the cutting edge network architectures, memory subsystems, and processor technologies found in their more expensive supercomputer counterparts. Many users find that what Beowulf clusters lack in technology, they more than make up for with their significant cost advantage. In this paper, an architectural extension that adds reconfigurable computing to the network interface of Beowulf clusters is proposed. The proposed extension, called an intelligent network interface card (or INIC), enhances both the network and processor capabilities of the cluster, which has a significant impact on the performance of a crucial class of applications. Furthermore, for some applications, the proposed extension partially compensates for weaknesses in the PC memory subsystem. A prototype of the proposed architecture was constructed and analyzed. In addition, two applications, the 2D Fast Fourier Transform (2D‐FFT) and integer sorting, which benefit from the resulting architecture, are discussed and analyzed on the proposed architecture. Specifically, results indicate that the 2D‐FFT is performed 15–50% faster when using the prototype INIC rather than the comparable Gigabit Ethernet. Early simulation results also indicate that integer sorting will receive a 21–32% performance boost from the prototype. The significant improvements seen with a relatively limited prototype lead to the conclusion that cluster network interfaces enhanced with reconfigurable computing could significantly improve the Beowulf architecture. Copyright
conference on high performance computing (supercomputing) | 2001
Keith D. Underwood; Ron Sass; Walter B. Ligon
With a focus on commodity PC systems, Beowulf clusters traditionally lack the cutting edge network architectures, memory subsystems, and processor technologies found in their more expensive supercomputer counterparts. What Beowulf clusters lack in technology, they more than make up for with their significant cost advantage over traditional supercomputers. This paper presents the cost implications of an architectural extension that adds reconfigurable computing to the network interface of Beowulf clusters. A quantitative idea of cost-effectiveness is formulated to evaluate computing technologies. Here, cost-effectiveness is considered in the context of two applications: the 2D Fast Fourier transform (2D-FFT) and integer sorting.
field-programmable custom computing machines | 2001
Keith D. Underwood; Ron Sass; Walter B. Ligon
Despite a decade of research into their use for computing applications, FPGA-based custom computing machines are still only used to accelerate a limited range of applications. Recognizing that recent advances in network technology provide an opportunity for a more general-purpose application of custom computing machines, we develop the idea of an intelligent network adapter for cluster-based parallel computing, calling the resulting architecture an Adaptable Computing Cluster. Results presented suggest that placing the FPGAs in the data path to the network dramatically improves the performance and scalability of target applications. This is especially noteworthy because the target applications have historically not performed well on either technology. This paper discusses how FPGAs can be used to provide network functionality while increasing compute power. The focus is on a specific application, the 2D Fast Fourier Transform, with additional insights into the implications for parallel computing on a cluster.
conference on high performance computing (supercomputing) | 2003
Ranjesh G. Jaganathan; Keith D. Underwood; Ron Sass
The high overhead of generic protocols like TCP/IP provides strong motivation for the development of a better protocol architecture for cluster-based parallel computers. Reconfigurable computing has a unique opportunity to contribute hardware level protocol acceleration while retaining the flexibility to adapt to changing needs. Specifically, applications on a cluster have various quality of service needs. In addition, these applications typically run for a long time relative to the reconfiguration time of an FPGA. Thus, it is possible to provide application-specific protocol processing to improve performance and reduce space utilization. Reducing space utilization permits the use of a greater portion of the FPGA for other application-specific processing. This paper focuses on work to create a set of parameterizable components that can be put together as needed to obtain a customized protocol for each application. To study the feasibility of such an architecture, hardware components were built that can be stitched together as needed to provide the required functionality. Feasibility is demonstrated using four different protocol configurations, namely: (1) unreliable packet transfer; (2) reliable, unordered message transfer without duplicate elimination; (3) reliable, unordered message transfer with duplicate elimination; and (4) reliable, ordered message transfer with duplicate elimination. The different configurations illustrate trade-offs between chip space and functionality.
Cluster Computing | 2004
Keith D. Underwood; Walter B. Ligon; Ron Sass
With a focus on commodity PC systems, Beowulf clusters traditionally lack the cutting edge network architectures, memory subsystems, and processor technologies found in their more expensive supercomputer counterparts. What Beowulf clusters lack in technology, they more than make up for with their significant cost advantage over traditional supercomputers. This paper presents the cost implications of an architectural extension that adds reconfigurable computing to the network interface of Beowulf clusters. This extension is called an intelligent network interface card (INIC). A quantitative description of cost-effectiveness is formulated to compare alternatives.Cost-effectiveness is considered in the context of three applications: the 2D Fast Fourier Transform (2D-FFT), integer sorting, and PNN image classification. It is shown that, for these three representative applications, there is a range of basic hardware costs and cluster sizes for which the INIC is more efficient than a purely serial solution or an ordinary cluster. Furthermore, the cost model has proven useful for designing the next generation INIC.
embedded and ubiquitous computing | 2004
Jeffrey Young; Ron Sass
As SoC technology use increases, the question arises of how to connect the on-chip components. Current solutions use familiar components (such as busses and direct links) but these have throughput concerns and unnecessarily complicate the system design. This paper introduces the full/empty register pipe (FERP) interface and a collection of IP cores to support it. Along with its dataflow computational model, this interface is extremely well-suited for stream processing — an emerging computational model that is gaining popularity from embedded systems to supercomputers. An example is presented that illustrates how existing IP cores can be easily incorporated and how the resulting IP cores can be combined to perform complex, general stream-based algorithms.
ieee aerospace conference | 1999
Kim M. Hazelwood; Walter B. Ligon; Greg Monn; Natasha Pothen; Ron Sass; Dan C. Stanzione; Keith D. Underwood
As FPGA density increases, so does the potential for reconfigurable computing machines. Unfortunately, applications which take advantage of the higher densities require significant effort and involve prohibitively long design cycles when conventional methods are used. To combat this problem, we propose a design environment to manage this additional complexity. Our environment gives the end-user a mechanism for describing and managing algorithms at a higher level of abstraction than other extant methods such as writing VHDL or using schematic capture. This paper describes an experimental version of the environment. The core of our design tool is a general Algorithm Description Format (ADF) which represents an algorithm as an attributed graph, and a library of components, which are tailored to a particular FPGA device. In this paper we present a set of tools which operate on the ADF representation to provide a number of functions to aid in the creation of applications for configurable computing platforms. These tools include front-end tools for specifying a design, a suite of analysis tools to aid the user in refining and constraining the design, and a set of back-end tools which select components from the library based on the design constraints and place and route these components on the configurable computing platform. In this paper, particular emphasis is placed on the component model and the analysis tools.
Configurable computing : technology and applications. Conference | 1998
Brian Boysen; Nathan DeBardeleben; Kim M. Hazelwood; Walter B. Ligon; Ron Sass; Dan C. Stanzione; Keith D. Underwood
As FPGA density increases, so does the potential for configurable computing machines. Unfortunately, the larger designs which take advantage of the higher densities require much more effort and longer design cycles, making it even less likely to appeal to users outside the field of configurable computing. To combat this problem, we present the Reconfigurable Computing Application Development Environment (RCADE). The goals of RCADE are to produce high performance applications, to make FPGA design more accessible to those who are not hardware engineers, to shorten the design lifecycle, and to ease the process of migration from one platform to another. Here, we discuss the environment architecture, the current set of agents, and other agents to be developed.
field programmable gate arrays | 2004
Brian Hargrove Leonard; Jeffrey Young; Ron Sass
The requirements for placing modules in an automatic run-time reconfigurable (RTR) system differ from those of ASIC and other static environments. The most notable difference is the continual addition and removal of modules from the FPGAs. We examine the effectiveness of a collection of two-dimensional placement decision algorithms in a RTR environment. New algorithms are proposed in addition to several which have been adapted from their one-dimensional counterparts. All of the algorithms have been tested on a set of benchmark applications. Six programs used for testing include examples from encryption, image processing and matrix manipulations, as well as arithemetic, assignment, and looping benchmarks. These applications are multiplexed and simulated to run on our RTR system. A simple last-accessed removal scheme with no compaction is currently implemented. The merit of each algorithm is determined by a set of factors that include fragmentation, chip utilization, decision time, and program run-time benefit.