Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Alan Gara is active.

Publication


Featured researches published by Alan Gara.


Ibm Journal of Research and Development | 2005

Overview of the Blue Gene/L system architecture

Alan Gara; Matthias A. Blumrich; Dong Chen; George Liang-Tai Chiu; Paul W. Coteus; Mark E. Giampapa; Ruud A. Haring; Philip Heidelberger; Dirk Hoenicke; Gerard V. Kopcsay; Thomas A. Liebsch; Martin Ohmacht; Burkhard Steinmacher-Burow; Todd E. Takken; Pavlos M. Vranas

The Blue Gene®/L computer is a massively parallel supercomputer based on IBM system-on-a-chip technology. It is designed to scale to 65,536 dual-processor nodes, with a peak performance of 360 teraflops. This paper describes the project objectives and provides an overview of the system architecture that resulted. We discuss our application-based approach and rationale for a low-power, highly integrated design. The key architectural features of Blue Gene/L are introduced in this paper: the link chip component and five Blue Gene/L networks, the PowerPC® 440 core and floating-point enhancements, the on-chip and off-chip distributed memory system, the node- and system-level design for high reliability, and the comprehensive approach to fault isolation.


Ibm Journal of Research and Development | 2005

Blue Gene/L torus interconnection network

Narasimha R. Adiga; Matthias A. Blumrich; Dong Chen; Paul W. Coteus; Alan Gara; Mark E. Giampapa; Philip Heidelberger; Sarabjeet Singh; Burkhard Steinmacher-Burow; Todd E. Takken; Mickey Tsao; Pavlos M. Vranas

The main interconnect of the massively parallel Blue Gene®/L is a three-dimensional torus network with dynamic virtual cut-through routing. This paper describes both the architecture and the microarchitecture of the torus and a network performance simulator. Both simulation results and hardware measurements are presented.


international symposium on microarchitecture | 2012

The IBM Blue Gene/Q Compute Chip

Ruud A. Haring; Martin Ohmacht; Thomas W. Fox; Michael Karl Gschwind; David L. Satterfield; Krishnan Sugavanam; Paul W. Coteus; Philip Heidelberger; Matthias A. Blumrich; Robert W. Wisniewski; Alan Gara; George Liang-Tai Chiu; Peter A. Boyle; Norman H. Chist; Changhoan Kim

Blue Gene/Q aims to build a massively parallel high-performance computing system out of power-efficient processor chips, resulting in power-efficient, cost-efficient, and floor-space- efficient systems. Focusing on reliability during design helps with scaling to large systems and lowers the total cost of ownership. This article examines the architecture and design of the Compute chip, which combines processors, memory, and communication functions on a single chip.


Ibm Journal of Research and Development | 2005

Optimizing task layout on the Blue Gene/L supercomputer

Gyan Bhanot; Alan Gara; Philip Heidelberger; Eoin M. Lawless; James C. Sexton; Robert Walkup

A general method for optimizing problem layout on the Blue Gene®/L (BG/L) supercomputer is described. The method takes as input the communication matrix of an arbitrary problem as an array with entries C(i, j), which represents the data communicated from domain i to domain j. Given C(i, j), we implement a heuristic map that attempts to sequentially map a domain and its communication neighbors either to the same BG/L node or to near-neighbor nodes on the BG/L torus, while keeping the number of domains mapped to a BG/L node constant. We then generate a Markov chain of maps using Monte Carlo simulation with free energy F =Σi,j C(i, j)H(i, j), where H(i, j) is the smallest number of hops on the BG/L torus between domain i and domain j. For two large parallel applications, SAGE and UMT2000, the method was tested against the default Message Passing Interface rank order layout on up to 2,048 BG/L nodes. It produced maps that improved communication efficiency by up to 45%.


Ibm Journal of Research and Development | 2005

Overview of the QCDSP and QCDOC computers

Peter A. Boyle; Dong Chen; Norman H. Christ; Michael Clark; Saul D. Cohen; C. Cristian; Zhihua Dong; Alan Gara; Balint Joo; Chulwoo Jung; Changhoan Kim; L. Levkova; X. Liao; G. Liu; Robert D. Mawhinney; Shigemi Ohta; Konstantin Petrov; Tilo Wettig; A. Yamaguchi

The QCDSP and QCDOC computers are two generations of multithousand-node multidimensional mesh-based computers designed to study quantum chromodynamics (QCD), the theory of the strong nuclear force. QCDSP (QCD on digital signal processors), a four-dimensional mesh machine, was completed in 1998; in that year, it won the Gordon Bell Prize in the price/performance category. Two large installations--of 8,192 and 12,288 nodes, with a combined peak speed of one teraflops--have been in operation since. QCD-on-a-chip (QCDOC) utilizes a sixdimensional mesh and compute nodes fabricated with IBM systemon-a-chip technology. It offers a tenfold improvement in price/ performance. Currently, 100-node versions are operating, and there are plans to build three 12,288-node, 10-teraflops machines. In this paper, we describe the architecture of both the QCDSP and QCDOC machines, the operating systems employed, the user software environment, and the performance of our application-- lattice QCD.


Ibm Journal of Research and Development | 2005

Blue Gene/L advanced diagnostics environment

Mark E. Giampapa; Ralph Bellofatto; Matthias A. Blumrich; Dong Chen; Marc Boris Dombrowa; Alan Gara; Ruud A. Haring; Philip Heidelberger; Dirk Hoenicke; Gerard V. Kopcsay; Ben J. Nathanson; Burkhard Steinmacher-Burow; Martin Ohmacht; Valentina Salapura; Pavlos M. Vranas

This paper describes the Blue Gene®/L advanced diagnostics environment (ADE) used throughout all aspects of the Blue Gene/L project, including design, logic verification, bring-up, diagnostics, and manufacturing test. The Blue Gene/L ADE consists of a lightweight multithreaded coherence-managed kernel, runtime libraries, device drivers, system programming interfaces, compilers, and host-based development tools. It provides complete and flexible access to all features of the Blue Gene/L hardware. Prior to the existence of hardware, ADE was used on Very high-speed integrated circuit Hardware Description Language (VHDL) models, not only for logic verification, but also for performance measurements, code-path analysis, and evaluation of architectural tradeoffs. During early hardware bring-up, the ability to run in a cycle-reproducible manner on both hardware and VHDL proved invaluable in fault isolation and analysis. However, ADE is also capable of supporting high-performance applications and parallel test cases, thereby permitting us to stress the hardware to the limits of its capabilities. This paper also provides insights into system-level and device-level programming of Blue Gene/L to assist developers of high-performance applications o more fully exploit the performance of the machine.


Ibm Journal of Research and Development | 2005

Packaging the Blue Gene/L supercomputer

Paul W. Coteus; H. R. Bickford; T. M. Cipolla; Paul G. Crumley; Alan Gara; Shawn A. Hall; Gerard V. Kopcsay; Alphonso P. Lanzetta; L. S. Mok; Rick A. Rand; R. Swetz; Todd E. Takken; P. La Rocca; C. Marroquin; P. R. Germann; M. J. Jeanson

As 1999 ended, IBM announced its intention to construct a one-petaflop supercomputer. The construction of this system was based on a cellular architecture--the use of relatively small but powerful building blocks used together in sufficient quantities to construct large systems. The first step on the road to a petaflop machine (one quadrillion floating-point operations in a second) is the Blue Gene®/L supercomputer. Blue Gene/L combines a low-power processor with a highly parallel architecture to achieve unparalleled computing performance per unit volume. Implementing the Blue Gene/L packaging involved trading off considerations of cost, power, cooling, signaling, electromagnetic radiation, mechanics, component selection, cabling, reliability, service strategy, risk, and schedule. This paper describes how 1,024 dual-processor compute application-specific integrated circuits (ASICs) are packaged in a scalable rack, and how racks are combined and augmented with host computers and remote storage. The Blue Gene/L interconnect, power, cooling, and control systems are described individually and as part of the synergistic whole.


high-performance computer architecture | 2008

Design and implementation of the blue gene/P snoop filter

Valentina Salapura; Matthias A. Blumrich; Alan Gara

As multi-core processors evolve, coherence traffic between cores is becoming problematic, both in terms of performance and power. The negative effects of coherence (snoop) traffic can be significantly mitigated through snoop filtering. Shielding each cache with a device that can squash snoop requests for addresses known not to be in cache improves performance significantly for caches that cannot perform normal load and snoop lookups simultaneously. In addition, reducing snoop lookups yields power savings. This paper describes the design of the Blue Gene/P snoop filters, and presents hardware measurements to demonstrate their effectiveness. The Blue Gene/P snoop filters combine stream registers and snoop caches to capture both the locality of snoop addresses and their streaming behavior. Simulations of SPLASH-2 benchmarks illustrate tradeoffs and strengths of these two techniques. Their combination is shown to be most effective, eliminating 94-99% of all snoop requests using very few stream registers and snoop cache lines. This translates into an average performance improvement of almost 20% for the NAS benchmarks running on an actual Blue Gene/P system.


conference on high performance computing (supercomputing) | 2004

Unlocking the Performance of the BlueGene/L Supercomputer

George S. Almasi; Siddhartha Chatterjee; Alan Gara; John A. Gunnels; Manish Gupta; Amy Henning; José E. Moreira; Bob Walkup

The BlueGene/L supercomputer is expected to deliver new levels of application performance by providing a combination of good single-node computational performance and high scalability. To achieve good single-node performance, the BlueGene/L design includes a special dual floating-point unit on each processor and the ability to use two processors per node. BlueGene/L also includes both a torus and a tree network to achieve high scalability. We demonstrate how benchmarks and applications can take advantage of these architectural features to get the most out of BlueGene/L.


International Journal of Parallel Programming | 2007

The blue gene/L supercomputer: a hardware and software story

José E. Moreira; Valentina Salapura; George S. Almasi; Charles J. Archer; Ralph Bellofatto; Peter Edward Bergner; Randy Bickford; Matthias A. Blumrich; José R. Brunheroto; Arthur A. Bright; Michael Brian Brutman; José G. Castaños; Dong Chen; Paul W. Coteus; Paul G. Crumley; Sam Ellis; Thomas Eugene Engelsiepen; Alan Gara; Mark E. Giampapa; Tom Gooding; Shawn A. Hall; Ruud A. Haring; Roger L. Haskin; Philip Heidelberger; Dirk Hoenicke; Todd A. Inglett; Gerard V. Kopcsay; Derek Lieber; David Roy Limpert; Patrick Joseph McCarthy

The Blue Gene/L system at the Department of Energy Lawrence Livermore National Laboratory in Livermore, California is the world’s most powerful supercomputer. It has achieved groundbreaking performance in both standard benchmarks as well as real scientific applications. In that process, it has enabled new science that simply could not be done before. Blue Gene/L was developed by a relatively small team of dedicated scientists and engineers. This article is both a description of the Blue Gene/L supercomputer as well as an account of how that system was designed, developed, and delivered. It reports on the technical characteristics of the system that made it possible to build such a powerful supercomputer. It also reports on how teams across the world worked around the clock to accomplish this milestone of high-performance computing.

Researchain Logo
Decentralizing Knowledge