Rick A. Rand
IBM
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Rick A. Rand.
Ibm Journal of Research and Development | 2005
Paul W. Coteus; H. R. Bickford; T. M. Cipolla; Paul G. Crumley; Alan Gara; Shawn A. Hall; Gerard V. Kopcsay; Alphonso P. Lanzetta; L. S. Mok; Rick A. Rand; R. Swetz; Todd E. Takken; P. La Rocca; C. Marroquin; P. R. Germann; M. J. Jeanson
As 1999 ended, IBM announced its intention to construct a one-petaflop supercomputer. The construction of this system was based on a cellular architecture--the use of relatively small but powerful building blocks used together in sufficient quantities to construct large systems. The first step on the road to a petaflop machine (one quadrillion floating-point operations in a second) is the Blue Gene®/L supercomputer. Blue Gene/L combines a low-power processor with a highly parallel architecture to achieve unparalleled computing performance per unit volume. Implementing the Blue Gene/L packaging involved trading off considerations of cost, power, cooling, signaling, electromagnetic radiation, mechanics, component selection, cabling, reliability, service strategy, risk, and schedule. This paper describes how 1,024 dual-processor compute application-specific integrated circuits (ASICs) are packaged in a scalable rack, and how racks are combined and augmented with host computers and remote storage. The Blue Gene/L interconnect, power, cooling, and control systems are described individually and as part of the synergistic whole.
International Journal of Parallel Programming | 2007
José E. Moreira; Valentina Salapura; George S. Almasi; Charles J. Archer; Ralph Bellofatto; Peter Edward Bergner; Randy Bickford; Matthias A. Blumrich; José R. Brunheroto; Arthur A. Bright; Michael Brian Brutman; José G. Castaños; Dong Chen; Paul W. Coteus; Paul G. Crumley; Sam Ellis; Thomas Eugene Engelsiepen; Alan Gara; Mark E. Giampapa; Tom Gooding; Shawn A. Hall; Ruud A. Haring; Roger L. Haskin; Philip Heidelberger; Dirk Hoenicke; Todd A. Inglett; Gerard V. Kopcsay; Derek Lieber; David Roy Limpert; Patrick Joseph McCarthy
The Blue Gene/L system at the Department of Energy Lawrence Livermore National Laboratory in Livermore, California is the world’s most powerful supercomputer. It has achieved groundbreaking performance in both standard benchmarks as well as real scientific applications. In that process, it has enabled new science that simply could not be done before. Blue Gene/L was developed by a relatively small team of dedicated scientists and engineers. This article is both a description of the Blue Gene/L supercomputer as well as an account of how that system was designed, developed, and delivered. It reports on the technical characteristics of the system that made it possible to build such a powerful supercomputer. It also reports on how teams across the world worked around the clock to accomplish this milestone of high-performance computing.
computing frontiers | 2005
Valentina Salapura; Randy Bickford; Matthias A. Blumrich; Arthur A. Bright; Dong Chen; Paul W. Coteus; Alan Gara; Mark E. Giampapa; Michael Karl Gschwind; Manish Gupta; Shawn A. Hall; Ruud A. Haring; Philip Heidelberger; Dirk Hoenicke; Gerard V. Kopcsay; Martin Ohmacht; Rick A. Rand; Todd E. Takken; Pavlos M. Vranas
The BlueGene/L supercomputer has been designed with a focus on power/performance efficiency to achieve high application performance under the thermal constraints of common data centers. To achieve this goal, emphasis was put on system solutions to engineer a power-efficient system. To exploit thread level parallelism, the BlueGene/L system can scale to 64 racks with a total of 65536 computer nodes consisting of a single compute ASIC integrating all system functions with two industry-standard PowerPC microprocessor cores in a chip multiprocessor configuration. Each PowerPC processor exploits data-level parallelism with a high-performance SIMD oating point unitTo support good application scaling on such a massive system, special emphasis was put on efficient communication primitives by including five highly optimized communification networks. After an initial introduction of the Blue-Gene/L system architecture, we analyze power/performance efficiency for the BlueGene system using performance and power characteristics for the overall system performance (as exemplified by peak performance numbers.To understand application scaling behavior, and its impact on performance and power/performance efficiency, we analyze the NAMD molecular dynamics package using the ApoA1 benchmark. We find that even for strong scaling problems, BlueGene/L systems can deliver superior performance scaling and deliver significant power/performance efficiency. Application benchmark power/performance scaling for the voltage-invariant energy delay 2 power/performance metric demonstrates that choosing a power-efficient 700MHz embedded PowerPC processor core and relying on application parallelism was the right decision to build a powerful, and power/performance efficient system
IEEE Transactions on Very Large Scale Integration Systems | 2004
Mary Yvonne Lanzerotti; Giovanni Fiorenza; Rick A. Rand
Computer hardware components have changed significantly since the 1960s, 1970s, 1980s, and even since the early 1990s. Work concerning Rents memos prior to the present paper has been based on a 1971 interpretation of two unpublished memoranda written in 1960 by E. F. Rent while working at IBM, even though todays computer components are significantly different from those in 1960 and 1971. However, because of the significant changes in the design and implementation of computer hardware components since 1960 and 1971, a new interpretation of Rents memos is needed for todays components. We have obtained copies of Rents two memos. In these memos, Rent describes the method that he used to obtain an empirical relationship between properties of the computer hardware components of the IBM 1401 and the IBM 1410 computers. We have studied these memos carefully in order to understand Rents original intent. Based on our careful reading of these two memos, the personal knowledge of one of us with the 1401 and 1410 computers, and our experience designing ultralarge-scale integrated (ULSI) circuits for high-performance microprocessors, we have derived an historically equivalent interpretation of Rents memos suitable for todays computer components. The purpose of this paper is to present a new interpretation of the memos and to present an application to wirelength distributions of real ULSI circuitry. In this paper, we will: 1) describe the contents of the memos and Rents method; 2) provide an historically-equivalent interpretation of Rents memos for todays computer components; and 3) apply this new interpretation to real ULSI control logic circuitry in the 1.3-GHz IBM POWER4 microprocessor. In this paper, we will show that this new interpretation of the two memos provides improved wirelength distribution models with better qualitative agreement with measurements and more accurate estimates of wirelength distributions and wirelength requirements for real ULSI designs compared with prior methods.
Ibm Journal of Research and Development | 1998
Evan G. Colgan; Paul Matthew Alt; Robert L. Wisnieff; Peter M. Fryer; Eileen A. Galligan; William S. Graham; Paul F. Greier; Raymond Robert Horton; Harold Ifill; Leslie Charles Jenkins; Richard A. John; Richard I. Kaufman; Yue Kuo; Alphonso P. Lanzetta; Kenneth F. Latzko; Frank R. Libsch; Shui-Chih Alan Lien; Steven Edward Millman; Robert Wayne Nywening; Robert J. Polastre; Carl G. Powell; Rick A. Rand; John J. Ritsko; Mary Beth Rothwell; John L. Staples; Kevin W. Warren; J. Wilson; Steven L. Wright
A 157-dot-per-inch, 262K-color, 10.5-in.- diagonal, 1280 × 1024 (SXGA) display has been fabricated using a six-mask process with Cu or Al-alloy thin-film gates. The combination of high resolution and gray-scale accuracy has been shown to render color images and text with paperlike legibility. The low-resistivity gate metallization and trilayer-type TFTs with a channel length of 6-8 µm were fabricated with a six-mask process which is extendible to larger, higher-resolution displays. A combination of double-sided driving and active line repair was used so that open gate lines or data lines did not result in visible line defects. A flexible drive-electronics system was developed to address the display and characterize its performance under different drive conditions.
system-level interconnect prediction | 2005
Mary Yvonne Lanzerotti; Giovanni Fiorenza; Rick A. Rand
This paper presents a comprehensive assessment of interconnect requirements in ULSI control logic circuitry and quantifies the agreement observed (1) between estimates and measurements of average wire-length in individual designs in real chips, and (2) between wire-length distributions provided by the models and wire-length distributions obtained from measurements. In this study, actual interconnect data is measured in ASIC-like control logic designs in the six functional units of the 1.3GHz POWER4. This paper compares interconnect measurements with estimates for control logic in individual designs, in functional units, and in the entire POWER4 core. The results presented in this paper show that the estimates are typically lower than the actual wire-length measurements. The results also show that the estimates of the total wire-length for all of the control logic in the POWER4 agree to within 31% of the total measured wire-length.
Ibm Journal of Research and Development | 2013
Paul W. Coteus; Shawn A. Hall; Todd E. Takken; Rick A. Rand; Shurong Tian; Gerard V. Kopcsay; Randy Bickford; Francis P. Giordano; Christopher Marroquin; Mark J. Jeanson
The IBM Blue Gene®/Q supercomputer is designed for highly efficient computing for problems dominated by floating-point computation. Its target mean time between failures for a 96-rack, 98, 304-node system is three days, allowing tasks requiring computation for many days to run at scale, with little time wasted on checkpoint-restart operations. This paper describes various elements of the compute application-specific integrated circuit and the system package, and how they contribute to low power consumption and high reliability.
IEEE Transactions on Very Large Scale Integration Systems | 2004
Mary Yvonne Lanzerotti; Giovanni Fiorenza; Rick A. Rand
Ultralarge-scale integrated (ULSI) chip design data is needed for an assessment of existing on-chip wirelength distribution models. Data extracted from modern chips such as high-performance microprocessors provide information about actual wire length requirements in ULSI chip designs. These requirements are compared with wirelength estimates obtained by evaluating existing models as functions of Rents parameters that are extracted from the designs. This brief assesses the extent to which existing models estimate wirelength requirements in 100 ASIC-like control logic designs in the 1.3-GHz POWER4 microprocessor. For each design, physical design characteristics and wirelength requirements are measured and compared with model estimates. Lack of agreement between the data and models is observed for most designs, and possible reasons for the lack of agreement are discussed.
system-level interconnect prediction | 2007
Mary Yvonne Lanzerotti; Giovanni Fiorenza; Rick A. Rand
This paper presents models and a methodology to evaluate tradeoffs between technology and design to obtain the highest frequency in ULSI design projects and quantifies the performance improvement that can be expected. With respect to the standard chip design process, it is well known in the academic community that circuits and chips are required to satisfy specific constraints, most notably the requirement that all signals must have zero slack when the transistors and wires are manufactured at some pre-specified technology node. To amortize the cost of the design process, which is time-consuming and complex, there is a need to migrate the designs to future technology nodes with minimal redesign. However, this problem and the associated implications of design migration are less well known, and at present there are no existing models to help designers evaluate whether migrated designs will operate successfully in a future technology or whether migrated designs will fail and thus cause chip failure. Thus, there is a need for research to evaluate the impact of design changes on chip performance. This paper presents a methodology to evaluate and quantify the performance impact of design changes, where we express the impact on performance as an effective change in dielectic constant in the wire environment. In this study, as in a previous study[1], performance estimates obtained from the model are compared with values obtained for interconnections in 18 ASIC-like control logic designs in the Instruction Fetch Unit (IFU) of the 1.3GHz POWER4 microprocessor.
Ibm Systems Journal | 2001
Frances E. Allen; George S. Almasi; Wanda Andreoni; D. Beece; B. J. Berne; Arthur A. Bright; José R. Brunheroto; Călin Caşcaval; José G. Castaños; Paul W. Coteus; Paul G. Crumley; Alessandro Curioni; Monty M. Denneau; Wilm E. Donath; Maria Eleftheriou; Blake G. Fitch; B. Fleischer; C. J. Georgiou; Robert S. Germain; Mark E. Giampapa; Donna L. Gresh; Manish Gupta; Ruud A. Haring; H. Ho; Peter H. Hochschild; Susan Flynn Hummel; T. Jonas; Derek Lieber; G. Martyna; K. Maturu