Paul G. Crumley | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Paul G. Crumley is active.

Explore More

Publication

Featured researches published by Paul G. Crumley.

Ibm Journal of Research and Development | 2005

Blue Gene/L programming and operating environment

José E. Moreira; George S. Almasi; Charles J. Archer; Ralph Bellofatto; Peter Bergner; José R. Brunheroto; Michael Brutman; José G. Castaños; Paul G. Crumley; Manish Gupta; Todd Inglett; Derek Lieber; David Limpert; Patrick McCarthy; Mark Megerian; Mark P. Mendell; Michael Mundy; Don Reed; Ramendra K. Sahoo; Alda Sanomiya; Richard Shok; Brian E. Smith; Greg Stewart

With up to 65,536 compute nodes and a peak performance of more than 360 teraflops, the Blue Gene®/L (BG/L) supercomputer represents a new level of massively parallel systems. The system software stack for BG/L creates a programming and operating environment that harnesses the raw power of this architecture with great effectiveness. The design and implementation of this environment followed three major principles: simplicity, performance, and familiarity. By specializing the services provided by each component of the system architecture, we were able to keep each one simple and leverage the BG/L hardware features to deliver high performance to applications. We also implemented standard programming interfaces and programming languages that greatly simplified the job of porting applications to BG/L. The effectiveness of our approach has been demonstrated by the operational success of several prototype and production machines, which have already been scaled to 16,384 nodes.

Ibm Journal of Research and Development | 2005

Packaging the Blue Gene/L supercomputer

Paul W. Coteus; H. R. Bickford; T. M. Cipolla; Paul G. Crumley; Alan Gara; Shawn A. Hall; Gerard V. Kopcsay; Alphonso P. Lanzetta; L. S. Mok; Rick A. Rand; R. Swetz; Todd E. Takken; P. La Rocca; C. Marroquin; P. R. Germann; M. J. Jeanson

As 1999 ended, IBM announced its intention to construct a one-petaflop supercomputer. The construction of this system was based on a cellular architecture--the use of relatively small but powerful building blocks used together in sufficient quantities to construct large systems. The first step on the road to a petaflop machine (one quadrillion floating-point operations in a second) is the Blue Gene®/L supercomputer. Blue Gene/L combines a low-power processor with a highly parallel architecture to achieve unparalleled computing performance per unit volume. Implementing the Blue Gene/L packaging involved trading off considerations of cost, power, cooling, signaling, electromagnetic radiation, mechanics, component selection, cabling, reliability, service strategy, risk, and schedule. This paper describes how 1,024 dual-processor compute application-specific integrated circuits (ASICs) are packaged in a scalable rack, and how racks are combined and augmented with host computers and remote storage. The Blue Gene/L interconnect, power, cooling, and control systems are described individually and as part of the synergistic whole.

International Journal of Parallel Programming | 2007

The blue gene/L supercomputer: a hardware and software story

José E. Moreira; Valentina Salapura; George S. Almasi; Charles J. Archer; Ralph Bellofatto; Peter Edward Bergner; Randy Bickford; Matthias A. Blumrich; José R. Brunheroto; Arthur A. Bright; Michael Brian Brutman; José G. Castaños; Dong Chen; Paul W. Coteus; Paul G. Crumley; Sam Ellis; Thomas Eugene Engelsiepen; Alan Gara; Mark E. Giampapa; Tom Gooding; Shawn A. Hall; Ruud A. Haring; Roger L. Haskin; Philip Heidelberger; Dirk Hoenicke; Todd A. Inglett; Gerard V. Kopcsay; Derek Lieber; David Roy Limpert; Patrick Joseph McCarthy

The Blue Gene/L system at the Department of Energy Lawrence Livermore National Laboratory in Livermore, California is the world’s most powerful supercomputer. It has achieved groundbreaking performance in both standard benchmarks as well as real scientific applications. In that process, it has enabled new science that simply could not be done before. Blue Gene/L was developed by a relatively small team of dedicated scientists and engineers. This article is both a description of the Blue Gene/L supercomputer as well as an account of how that system was designed, developed, and delivered. It reports on the technical characteristics of the system that made it possible to build such a powerful supercomputer. It also reports on how teams across the world worked around the clock to accomplish this milestone of high-performance computing.

european conference on parallel processing | 2003

An Overview of the Blue Gene/L System Software Organization

George S. Almasi; Ralph Bellofatto; José R. Brunheroto; Calin Cascaval; José G. Castaños; Luis Ceze; Paul G. Crumley; C. Christopher Erway; Joseph Gagliano; Derek Lieber; Xavier Martorell; José E. Moreira; Alda Sanomiya; Karin Strauss

The Blue Gene/L supercomputer will use system-on-a-chip integration and a highly scalable cellular architecture. With 65,536 compute nodes, Blue Gene/L represents a new level of complexity for parallel system software, with specific challenges in the areas of scalability, maintenance and usability. In this paper we present our vision of a software architecture that faces up to these challenges, and the simulation framework that we have used for our experiments.

Parallel Processing Letters | 2003

AN OVERVIEW OF THE BLUEGENE/L SYSTEM SOFTWARE ORGANIZATION

George S. Almasi; Ralph Bellofatto; José R. Brunheroto; Calin Cascaval; José G. Castaños; Paul G. Crumley; C. Christopher Erway; Derek Lieber; Xavier Martorell; José E. Moreira; Ramendra K. Sahoo; Alda Sanomiya; Luis Ceze; Karin Strauss

BlueGene/L is a 65,536-compute node massively parallel supercomputer, built using system-on-a-chip integration and a cellular architecture. BlueGene/L represents a major challenge for parallel system software, particularly in the areas of scalability, maintainability, and usability. In this paper, we present the organization of the BlueGene/L system software, with emphasis on the features that address those challenges. The system software was developed in parallel with the hardware, relying on an architecturally accurate simulator of the machine. We validated this approach by demonstrating a working system software stack and high performance on real parallel applications just a few weeks after first hardware availability.

Ibm Journal of Research and Development | 2005

Blue Gene/L compute chip: control, test, and bring-up infrastructure

Ruud A. Haring; Ralph Bellofatto; Arthur A. Bright; Paul G. Crumley; Marc Boris Dombrowa; Steve M. Douskey; Matthew R. Ellavsky; Balaji Gopalsamy; Dirk Hoenicke; Thomas A. Liebsch; James A. Marcella; Martin Ohmacht

The Blue Gene®/L compute (BLC) and Blue Gene/L link (BLL) chips have extensive facilities for control, bring-up, self-test, debug, and nonintrusive performance monitoring built on a serial interface compliant with IEEE Standard 1149.1. Both the BLL and the BLC chips contain a standard eServer™ chip JTAG controller called the access macro. For BLC, the capabilities of the access macro were extended 1) to accommodate the secondary JTAG controllers built into embedded PowerPC® cores; 2) to provide direct access to memory for initial boot code load and for messaging between the service node and the BLC chip; 3) to provide nonintrusive access to device control registers; and 4) to provide a suite of chip configuration and control registers. The BLC clock tree structure is described. It accommodates both functional requirements and requirements for enabling multiple built-in self-test domains, differentiated both by frequency and functionality. The chip features a debug port that allows observation of critical chip signals at full speed.

international parallel and distributed processing symposium | 2003

System management in the BlueGene/L supercomputer

George S. Almasi; Leonardo R. Bachega; Ralph Bellofatto; José R. Brunheroto; Calin Cascaval; José G. Castaños; Paul G. Crumley; C. Christopher Erway; Joseph Gagliano; Derek Lieber; Pedro Mindlin; José E. Moreira; Ramendra K. Sahoo; Alda Sanomiya; Eugen Schenfeld; Richard A. Swetz; Myung M. Bae; Gregory D. Laib; Kavitha Ranganathan; Yariv Aridor; Tamar Domany; Y. Gal; Oleg Goldshmidt; Edi Shmueli

The BlueGene/L supercomputer will use system-on-a-chip integration and a highly scalable cellular architecture to deliver 360 teraflops of peak computing power. With 65536 compute nodes, BlueGene/L represents a new level of scalability for parallel systems. As such, it is natural for many scalability challenges to arise. In this paper, we discuss system management and control, including machine booting, software installation, user account management, system monitoring, and job execution. We address the issue of scalability by organizing the system hierarchically. The 65536 compute nodes are organized in 1024 clusters of 64 compute nodes each, called processing sets. Each processing set is under control of a 65th node, called an I/O node. The 1024 processing sets can then be managed to a great extent as a regular Linux cluster, of which there are several successful examples. Regular cluster management is complemented by BlueGene/L specific services, performed by a service node over a separate control network. Our software development and experiments have been conducted so far using an architecturally accurate simulator of BlueGene/L, and we are gearing up to test real prototypes in 2003.

international parallel and distributed processing symposium | 2008

Agent multiplication: An economical large-scale testing environment for system management solutions

Kyung Dong Ryu; David Daly; Mary R. Seminara; Sukhyun Song; Paul G. Crumley

System management solutions are designed to scale to thousands or more machines and networked devices. However, it is challenging to test and verify the proper operation and scalability of management software given the limited resources of a testing lab. We have developed a method called agent multiplication, in which one physical testing machine is used to represent hundreds of client machines. This provides the necessary client load to test the performance and scalability of the management software and server within limited resources. In addition, our approach guarantees that the test environment remains consistent between test runs, ensuring that test results can be meaningfully compared. We used agent multiplication to test and verify the operation of a server managing 4,000 systems. We exercised the server functions with only 8 test machines. Applying this test environment to an early version of a real enterprise system management solution we were able to uncover critical bugs, resolve race conditions, and examine and adjust thread prioritization levels for improved performance.

international conference on cluster computing | 2002

Blue Gene/L, a system-on-a-chip

George S. Almasi; G.S. Almasi; D. Beece; Ralph Bellofatto; G. Bhanot; R. Bickford; M. Blumrich; Arthur A. Bright; José R. Brunheroto; Cǎlin Caşcaval; José G. Castaños; Luis Ceze; R. Coteus; S. Chatterjee; D. Chen; G. Chiu; T.M. Cipolla; Paul G. Crumley; A. Deutsch; M.B. Dombrowa; W. Donath; M. Eleftheriou; B. Fitch; Joseph Gagliano; Alan Gara; R. Germain; M.E. Giampapa; Manish Gupta; F. Gustavson; S. Hall

Summary form only given. Large powerful networks coupled to state-of-the-art processors have traditionally dominated supercomputing. As technology advances, this approach is likely to be challenged by a more cost-effective System-On-A-Chip approach, with higher levels of system integration. The scalability of applications to architectures with tens to hundreds of thousands of processors is critical to the success of this approach. Significant progress has been made in mapping numerous compute-intensive applications, many of them grand challenges, to parallel architectures. Applications hoping to efficiently execute on future supercomputers of any architecture must be coded in a manner consistent with an enormous degree of parallelism. The BG/L program is developing a peak nominal 180 TFLOPS (360 TFLOPS for some applications) supercomputer to serve a broad range of science applications. BG/L generalizes QCDOC, the first System-On-A-Chip supercomputer that is expected in 2003. BG/L consists of 65,536 nodes, and contains five integrated networks: a 3D torus, a combining tree, a Gb Ethernet network, barrier/global interrupt network and JTAG.

ieee international conference on high performance computing data and analytics | 2018

Towards a Composable Computer System

I-Hsin Chung; Bulent Abali; Paul G. Crumley

The recent advancement of technology in both software and hardware enables us to revisit the concept of the composable architecture in the system design. The composable system design provides flexibility to serve a variety of workloads. The system offers a dynamic co-design platform that allows experiments and measurements in a controlled environment. This speeds up the system design and software evolution. It also decouples the lifecycles of components. The design consideration includes adopting available technology with the understanding of application characteristics. With the flexibility, we show the design has the potential to be the infrastructure of both cloud computing and HPC architecture serving a variety of workloads.

Explore More