George B. Adams
Purdue University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by George B. Adams.
Computing in Science and Engineering | 2008
Gerhard Klimeck; Michael McLennan; Sean Brophy; George B. Adams; Mark Lundstrom
In 2002, the National Science Foundation established the Network for Computational Nanotechnology (NCN), a network of universities supporting the National Nanotechnology Initiative by bringing computational tools online, making the tools easy to use, and supporting the tools with educational materials. Along the way, NCN created a unique cyberinfrastructure to support its Web site, nanoHUB.org, where researchers, educators, and professionals collaborate, share resources, and solve real nanotechnology problems. In 2007, nanoHUB.org served more than 56,000 users from 172 countries. In this article, the authors share their experiences in developing this cyberinfrastructure and using it, particularly in an educational context.
IEEE Transactions on Computers | 1994
Pradeep Dubey; George B. Adams; Michael J. Flynn
Detecting independent operations is a prime objective for computers that are capable of issuing and executing multiple operations simultaneously. The number of instructions that are simultaneously examined for detecting those that are independent is the scope of concurrency detection. The authors present an analytical model for predicting the performance impact of varying the scope of concurrency detection as a function of available resources, such as number of pipelines in a superscalar architecture. The model developed can show where a performance bottleneck might be: insufficient resources to exploit discovered parallelism, insufficient instruction stream parallelism, or insufficient scope of concurrency detection. The cost associated with speculative execution is examined via a set of probability distributions that characterize the inherent parallelism in the instruction stream. These results were derived using traces from a Multiflow TRACE SCHEDULING compacting FORTRAN 77 and C compilers. The experiments provide misprediction delay estimates for 11 common application-level benchmarks under scope constraints, assuming speculative, out-of-order execution and run time scheduling. The throughput prediction of the analytical model is shown to be close to the measured static throughput of the compiler output. >
IEEE Transactions on Image Processing | 1999
Ji-Sang Yoo; Kelvin L. Fong; Jr-Jen Huang; Edward J. Coyle; George B. Adams
Stack filters are a class of nonlinear filters with excellent properties for signal restoration. Unfortunately, present algorithms for designing stack filters can only be used for small window sizes because of either their computational overhead or their serial nature. This paper presents a new adaptive algorithm for determining a stack filter that minimizes the mean absolute error criterion. The new algorithm retains the iterative nature of many current adaptive stack filtering algorithms, but significantly reduces the number of iterations required to converge to an optimal filter. This algorithm is faster than all currently available stack filter design algorithms, is simple to implement, and is shown in this paper to always converge to an optimal stack filter. Extensive comparisons between this new algorithm and all existing algorithms are provided. The comparisons are based both on the performance of the resulting filters and upon the time and space complexity of the algorithms. They demonstrate that the new algorithm has three advantages: it is faster than all other available algorithms; it can be used on standard workstations (SPARC 5 with 48 MB) to design filters with windows containing 20 or more points; and, its highly parallel structure allows very fast implementations on parallel machines. This new algorithm allows cascades of stack filters to be designed; stack filters with windows containing 72 points have been designed in a matter of minutes under this new approach.
workshop on computer architecture education | 1997
Yinong Zhang; George B. Adams
We have built an interactive, visual pipeline simulator, called dlxview, for the DLX instruction set and pipeline described in Computer Architecture A Quantitative Approach by Hennessy and Patterson [1]. This software provides animated versions of key figures and tables from the text and allows the user to readily follow details of pipeline activity as a code is simulated, to vary pipeline implementation, and to compare performance across different pipeline designs. The software package requires a system running Unix and X11, with Tcl/Tk installed, and using the GNU gcc compiler is recommended. A 256 color display with 1024x768 pixels is best for display, due to the detailed diagrams of the DLX pipeline. The software has been designed to run on a variety of platforms and has been tested on Solaris 2.3, SunOS 4.1.1, HP-UX 9.0, DEC OSF/1 4.0, and Linux kernel 1.2.1. DLXview is available at http://yara.ecn.purdue.edu/~teamaaa/dlxview/
Signal Processing | 1994
George B. Adams; Edward J. Coyle; Liangchien Lin; Lori E. Lucke; Keshab K. Parhi
Abstract Rank-order-based filters include rank-order filters, stack filters, and weighted order statistic filters. The output of a rank-order-based filter is always one of the sample points in its input window; which one is chosen depends only upon the ranks and positions of the samples within the window. This paper introduces new architectures for rank-order-based filters. They all achieve fast, efficient operation by exploiting an algorithm called input compression . Under this algorithm, the sample points in the input window are first mapped to their relative ranks — the sample points in a window of size N + 1 would thus be mapped to the integers 0− N . The rank-order-based filter to be implemented is then applied directly to this compressed input, and the rank chosen is then mapped back to the sample of that rank in the original data to obtain the final output. This approach has been used to implement rank-order filters, in which case the same rank is always chosen from the compressed data. In this paper, which rank is chosen also depends on the positions of the ranks in the compressed data. Implementations employing input compression have several advantages. They are computationally efficient like running order sorters, yet can be pipelined to a fine degree like sorting networks. In stack filter implementations, the threshold decomposition circuitry can be eliminated when input compression is combined with unary encoding of the ranks. Weighted order statistic filter implementations based on input compression can support programmable, noninteger weights.
Performance Evaluation | 1994
Ray A. Kamin; George B. Adams; Pradeep Dubey
Abstract The advent of superscalar processors introduces additional architectural design tradeoffs. Identifying the potential performance impact of these tradeoffs early is critical in achieving a high performance/cost ratio. To study performance, a method of analyzing dynamic instruction traces to characterize program parallelism is introduced. This technique allows performance evaluation of many architectural variations with a single execution pass through an application or benchmark of interest. A new parameter, the β α ratio, is used to quantify the available parallelism within programs versus the scope of concurrency detection. Performance is evaluated within a framework of multi-instruction issue, speculative execution, dynamic scheduling, and finite scope of concurrency detection. A trace-driven superscalar simulator was developed to extend the validation of a previously developed analytic model to dynamic trace analysis. Using actual execution traces from four science/engineering application benchmarks, the throughput prediction of the model is shown to be close to the throughput measured via the simulator.
International Journal of High Speed Computing | 1989
Ray A. Kamin; George B. Adams
The Fast Fourier Transform is a mainstay of certain numerical techniques for solving fluid dynamics problems. The Connection Machine CM-2 is the target for an investigation into the design of multidimensional SIMD parallel FFT algorithms for high performance. Critical algorithm design issues are discussed, necessary machine performance measurements are identified and made, and the performance of the developed FFT programs are measured. Our FFT programs are compared to the currently best Cray-2 FFT library program, CFFT2.
international conference on computer design | 1994
Ray A. Kamin Iii; George B. Adams; Pradeep Dubey
When the instruction level parallelism exceeds the available machine parallelism, a decision must be made as to which instructions get priority. This paper investigates the performance potential of five dynamic scheduling algorithms to prioritize instructions beyond basic blocks, thereby increasing processor utilization and performance. Trace-driven simulations for six benchmarks are used to analyze scheduling performance. A unique pipelining approach is introduced to address implementation limitations.<<ETX>>
signal processing systems | 1993
Robert L. Stevenson; George B. Adams; Leah H. Jamieson; Edward J. Delp
Many low-level image processing algorithms which are posed as variational problems can be numerically solved using local and iterative relaxation algorithms. Because of the structure of these algorithms, processing time will decrease nearly linearly with the addition of processing nodes working in parallel on the problem. In this article, we discuss the implementation of a particular application from this class of algorithms on the 8×8 processing array of the AT&T Pixel system. In particular, a case study for a image interpolation algorithm is presented. The performance of the implementation is evaluated in terms of the absolute processing time. We show that near linear speedup is achieved for such iterative image processing algorithms when the processing array is relatively small.
international symposium on computer architecture | 1998
Yinong Zhang; George B. Adams
DS (Decoupled-Superscalar) is a new microarchitecture that combines decoupled and superscalar techniques to exploit instruction level parallelism. Issue bandwidth is increased while circuit complexity growth is controlled with little negative impact on performance. Programs for DS are compiled into two instruction substreams: the dominant substream navigates the control flow and the rest of computational task is shared between the dominant and subsidiary substreams. Each substream is processed by a separate superscalar core realizable with current VLSI technology. DS machines are binary compatible with superscalar machines having the same instruction set, and a family of DS machines is binary compatible without recompilation.DS run time behavior is examined with an analytical model. A novel technique for controlling slip between substreams is introduced. Code partitioning issues of instruction count balancing and residence time balancing, important to any split-stream scheme, are discussed. Simulation shows DS achieves performance comparable to an aggressive superscalar, but with potentially less complex hardware and faster clock rate.