Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Ralph Bellofatto is active.

Publication


Featured researches published by Ralph Bellofatto.


field programmable gate arrays | 2012

A cycle-accurate, cycle-reproducible multi-FPGA system for accelerating multi-core processor simulation

Sameh W. Asaad; Ralph Bellofatto; Bernard Brezzo; Chuck Haymes; Mohit Kapur; Benjamin D. Parker; Thomas Roewer; Proshanta Saha; Todd E. Takken; Jose A. Tierno

Software based tools for simulation are not keeping up with the demands for increased chip and system design complexity. In this paper, we describe a cycle-accurate and cycle-reproducible large-scale FPGA platform that is designed from the ground up to accelerate logic verification of the Bluegene/Q compute node ASIC, a multi-processor SOC implemented in IBMs 45 nm SOI CMOS technology. This paper discusses the challenges for constructing such large-scale FPGA platforms, including design partitioning, clocking & synchronization, and debugging support, as well as our approach for addressing these challenges without sacrificing cycle accuracy and cycle reproducibility. The resulting fullchip simulation of the Bluegene/Q compute node ASIC runs at a simulated processor clock speed of 4 MHz, over 100,000 times faster than the logic level software simulation of the same design. The vast increase in simulation speed provides a new capability in the design cycle that proved to be instrumental in logic verification as well as early software development and performance validation for Bluegene/Q.


Ibm Journal of Research and Development | 2005

Blue Gene/L programming and operating environment

José E. Moreira; George S. Almasi; Charles J. Archer; Ralph Bellofatto; Peter Bergner; José R. Brunheroto; Michael Brutman; José G. Castaños; Paul G. Crumley; Manish Gupta; Todd Inglett; Derek Lieber; David Limpert; Patrick McCarthy; Mark Megerian; Mark P. Mendell; Michael Mundy; Don Reed; Ramendra K. Sahoo; Alda Sanomiya; Richard Shok; Brian E. Smith; Greg Stewart

With up to 65,536 compute nodes and a peak performance of more than 360 teraflops, the Blue Gene®/L (BG/L) supercomputer represents a new level of massively parallel systems. The system software stack for BG/L creates a programming and operating environment that harnesses the raw power of this architecture with great effectiveness. The design and implementation of this environment followed three major principles: simplicity, performance, and familiarity. By specializing the services provided by each component of the system architecture, we were able to keep each one simple and leverage the BG/L hardware features to deliver high performance to applications. We also implemented standard programming interfaces and programming languages that greatly simplified the job of porting applications to BG/L. The effectiveness of our approach has been demonstrated by the operational success of several prototype and production machines, which have already been scaled to 16,384 nodes.


Ibm Journal of Research and Development | 2005

Blue Gene/L advanced diagnostics environment

Mark E. Giampapa; Ralph Bellofatto; Matthias A. Blumrich; Dong Chen; Marc Boris Dombrowa; Alan Gara; Ruud A. Haring; Philip Heidelberger; Dirk Hoenicke; Gerard V. Kopcsay; Ben J. Nathanson; Burkhard Steinmacher-Burow; Martin Ohmacht; Valentina Salapura; Pavlos M. Vranas

This paper describes the Blue Gene®/L advanced diagnostics environment (ADE) used throughout all aspects of the Blue Gene/L project, including design, logic verification, bring-up, diagnostics, and manufacturing test. The Blue Gene/L ADE consists of a lightweight multithreaded coherence-managed kernel, runtime libraries, device drivers, system programming interfaces, compilers, and host-based development tools. It provides complete and flexible access to all features of the Blue Gene/L hardware. Prior to the existence of hardware, ADE was used on Very high-speed integrated circuit Hardware Description Language (VHDL) models, not only for logic verification, but also for performance measurements, code-path analysis, and evaluation of architectural tradeoffs. During early hardware bring-up, the ability to run in a cycle-reproducible manner on both hardware and VHDL proved invaluable in fault isolation and analysis. However, ADE is also capable of supporting high-performance applications and parallel test cases, thereby permitting us to stress the hardware to the limits of its capabilities. This paper also provides insights into system-level and device-level programming of Blue Gene/L to assist developers of high-performance applications o more fully exploit the performance of the machine.


International Journal of Parallel Programming | 2007

The blue gene/L supercomputer: a hardware and software story

José E. Moreira; Valentina Salapura; George S. Almasi; Charles J. Archer; Ralph Bellofatto; Peter Edward Bergner; Randy Bickford; Matthias A. Blumrich; José R. Brunheroto; Arthur A. Bright; Michael Brian Brutman; José G. Castaños; Dong Chen; Paul W. Coteus; Paul G. Crumley; Sam Ellis; Thomas Eugene Engelsiepen; Alan Gara; Mark E. Giampapa; Tom Gooding; Shawn A. Hall; Ruud A. Haring; Roger L. Haskin; Philip Heidelberger; Dirk Hoenicke; Todd A. Inglett; Gerard V. Kopcsay; Derek Lieber; David Roy Limpert; Patrick Joseph McCarthy

The Blue Gene/L system at the Department of Energy Lawrence Livermore National Laboratory in Livermore, California is the world’s most powerful supercomputer. It has achieved groundbreaking performance in both standard benchmarks as well as real scientific applications. In that process, it has enabled new science that simply could not be done before. Blue Gene/L was developed by a relatively small team of dedicated scientists and engineers. This article is both a description of the Blue Gene/L supercomputer as well as an account of how that system was designed, developed, and delivered. It reports on the technical characteristics of the system that made it possible to build such a powerful supercomputer. It also reports on how teams across the world worked around the clock to accomplish this milestone of high-performance computing.


ieee international conference on high performance computing data and analytics | 2014

Real-time scalable cortical computing at 46 giga-synaptic OPS/watt with ~100× speedup in time-to-solution and ~100,000× reduction in energy-to-solution

Andrew S. Cassidy; Rodrigo Alvarez-Icaza; Filipp Akopyan; Jun Sawada; John V. Arthur; Paul A. Merolla; Pallab Datta; Marc Gonzalez Tallada; Brian Taba; Alexander Andreopoulos; Arnon Amir; Steven K. Esser; Jeff Kusnitz; Rathinakumar Appuswamy; Chuck Haymes; Bernard Brezzo; Roger Moussalli; Ralph Bellofatto; Christian W. Baks; Michael Mastro; Kai Schleupen; Charles Edwin Cox; Ken Inoue; Steven Edward Millman; Nabil Imam; Emmett McQuinn; Yutaka Nakamura; Ivan Vo; Chen Guok; Don Nguyen

Drawing on neuroscience, we have developed a parallel, event-driven kernel for neurosynaptic computation, that is efficient with respect to computation, memory, and communication. Building on the previously demonstrated highly optimized software expression of the kernel, here, we demonstrate True North, a co-designed silicon expression of the kernel. True North achieves five orders of magnitude reduction in energy to-solution and two orders of magnitude speedup in time-to solution, when running computer vision applications and complex recurrent neural network simulations. Breaking path with the von Neumann architecture, True North is a 4,096 core, 1 million neuron, and 256 million synapse brain-inspired neurosynaptic processor, that consumes 65mW of power running at real-time and delivers performance of 46 Giga-Synaptic OPS/Watt. We demonstrate seamless tiling of True North chips into arrays, forming a foundation for cortex-like scalability. True Norths unprecedented time-to-solution, energy-to-solution, size, scalability, and performance combined with the underlying flexibility of the kernel enable a broad range of cognitive applications.


european conference on parallel processing | 2003

An Overview of the Blue Gene/L System Software Organization

George S. Almasi; Ralph Bellofatto; José R. Brunheroto; Calin Cascaval; José G. Castaños; Luis Ceze; Paul G. Crumley; C. Christopher Erway; Joseph Gagliano; Derek Lieber; Xavier Martorell; José E. Moreira; Alda Sanomiya; Karin Strauss

The Blue Gene/L supercomputer will use system-on-a-chip integration and a highly scalable cellular architecture. With 65,536 compute nodes, Blue Gene/L represents a new level of complexity for parallel system software, with specific challenges in the areas of scalability, maintenance and usability. In this paper we present our vision of a software architecture that faces up to these challenges, and the simulation framework that we have used for our experiments.


Parallel Processing Letters | 2003

AN OVERVIEW OF THE BLUEGENE/L SYSTEM SOFTWARE ORGANIZATION

George S. Almasi; Ralph Bellofatto; José R. Brunheroto; Calin Cascaval; José G. Castaños; Paul G. Crumley; C. Christopher Erway; Derek Lieber; Xavier Martorell; José E. Moreira; Ramendra K. Sahoo; Alda Sanomiya; Luis Ceze; Karin Strauss

BlueGene/L is a 65,536-compute node massively parallel supercomputer, built using system-on-a-chip integration and a cellular architecture. BlueGene/L represents a major challenge for parallel system software, particularly in the areas of scalability, maintainability, and usability. In this paper, we present the organization of the BlueGene/L system software, with emphasis on the features that address those challenges. The system software was developed in parallel with the hardware, relying on an architecturally accurate simulator of the machine. We validated this approach by demonstrating a working system software stack and high performance on real parallel applications just a few weeks after first hardware availability.


Ibm Journal of Research and Development | 2005

Blue Gene/L compute chip: control, test, and bring-up infrastructure

Ruud A. Haring; Ralph Bellofatto; Arthur A. Bright; Paul G. Crumley; Marc Boris Dombrowa; Steve M. Douskey; Matthew R. Ellavsky; Balaji Gopalsamy; Dirk Hoenicke; Thomas A. Liebsch; James A. Marcella; Martin Ohmacht

The Blue Gene®/L compute (BLC) and Blue Gene/L link (BLL) chips have extensive facilities for control, bring-up, self-test, debug, and nonintrusive performance monitoring built on a serial interface compliant with IEEE Standard 1149.1. Both the BLL and the BLC chips contain a standard eServer™ chip JTAG controller called the access macro. For BLC, the capabilities of the access macro were extended 1) to accommodate the secondary JTAG controllers built into embedded PowerPC® cores; 2) to provide direct access to memory for initial boot code load and for messaging between the service node and the BLC chip; 3) to provide nonintrusive access to device control registers; and 4) to provide a suite of chip configuration and control registers. The BLC clock tree structure is described. It accommodates both functional requirements and requirements for enabling multiple built-in self-test domains, differentiated both by frequency and functionality. The chip features a debug port that allows observation of critical chip signals at full speed.


international parallel and distributed processing symposium | 2003

System management in the BlueGene/L supercomputer

George S. Almasi; Leonardo R. Bachega; Ralph Bellofatto; José R. Brunheroto; Calin Cascaval; José G. Castaños; Paul G. Crumley; C. Christopher Erway; Joseph Gagliano; Derek Lieber; Pedro Mindlin; José E. Moreira; Ramendra K. Sahoo; Alda Sanomiya; Eugen Schenfeld; Richard A. Swetz; Myung M. Bae; Gregory D. Laib; Kavitha Ranganathan; Yariv Aridor; Tamar Domany; Y. Gal; Oleg Goldshmidt; Edi Shmueli

The BlueGene/L supercomputer will use system-on-a-chip integration and a highly scalable cellular architecture to deliver 360 teraflops of peak computing power. With 65536 compute nodes, BlueGene/L represents a new level of scalability for parallel systems. As such, it is natural for many scalability challenges to arise. In this paper, we discuss system management and control, including machine booting, software installation, user account management, system monitoring, and job execution. We address the issue of scalability by organizing the system hierarchically. The 65536 compute nodes are organized in 1024 clusters of 64 compute nodes each, called processing sets. Each processing set is under control of a 65th node, called an I/O node. The 1024 processing sets can then be managed to a great extent as a regular Linux cluster, of which there are several successful examples. Regular cluster management is complemented by BlueGene/L specific services, performed by a service node over a separate control network. Our software development and experiments have been conducted so far using an architecturally accurate simulator of BlueGene/L, and we are gearing up to test real prototypes in 2003.


Ibm Journal of Research and Development | 2005

Verification strategy for the Blue Gene/L chip

Michael E. Wazlowski; Narasimha R. Adiga; Daniel K. Beece; Ralph Bellofatto; Matthias A. Blumrich; Dong Chen; Marc Boris Dombrowa; Alan Gara; Mark E. Giampapa; Ruud A. Haring; Philip Heidelberger; Dirk Hoenicke; Ben J. Nathanson; Martin Ohmacht; R. Sharrar; Sarabjeet Singh; Burkhard Steinmacher-Burow; Robert B. Tremaine; Mickey Tsao; A. R. Umamaheshwaran; Pavlos M. Vranas

The Blue Gene®/L compute chip contains two PowerPC® 440 processor cores, private L2 prefetch caches, a shared L3 cache and double-data-rate synchronous dynamic random access memory (DDR SDRAM) memory controller, a collective network interface, a torus network interface, a physical network interface, an interrupt controller, and a bridge interface to slower devices. System-on-a-chip verification problems require a multilevel verification strategy in which the strengths of each layer offset the weaknesses of another layer. The verification strategy we adopted relies on the combined strengths of random simulation, directed simulation, and code-driven simulation at the unit and system levels. The strengths and weaknesses of the various techniques and our reasons for choosing them are discussed. The verification platform is based on event simulation and cycle simulation running on a farm of Intel-processor-based machines, several PowerPC-processor-based machines, and the internally developed hardware accelerator Awan. The cost/performance tradeoffs of the different platforms are analyzed. The success of the first Blue Gene/L nodes, which worked within days of receiving them and had only a small number of undetected bugs (none fatal), reflects both careful design and a comprehensive verification strategy.

Researchain Logo
Decentralizing Knowledge