Krishnan Sugavanam | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Krishnan Sugavanam is active.

Explore More

Publication

Featured researches published by Krishnan Sugavanam.

international symposium on microarchitecture | 2012

The IBM Blue Gene/Q Compute Chip

Ruud A. Haring; Martin Ohmacht; Thomas W. Fox; Michael Karl Gschwind; David L. Satterfield; Krishnan Sugavanam; Paul W. Coteus; Philip Heidelberger; Matthias A. Blumrich; Robert W. Wisniewski; Alan Gara; George Liang-Tai Chiu; Peter A. Boyle; Norman H. Chist; Changhoan Kim

Blue Gene/Q aims to build a massively parallel high-performance computing system out of power-efficient processor chips, resulting in power-efficient, cost-efficient, and floor-space- efficient systems. Focusing on reliability during design helps with scaling to large systems and lowers the total cost of ownership. This article examines the architecture and design of the Compute chip, which combines processors, memory, and communication functions on a single chip.

Ibm Journal of Research and Development | 2015

Active Memory Cube: A processing-in-memory architecture for exascale systems

Ravi Nair; Samuel F. Antao; Carlo Bertolli; Pradip Bose; José R. Brunheroto; Tong Chen; Chen-Yong Cher; Carlos H. Andrade Costa; J. Doi; Constantinos Evangelinos; Bruce M. Fleischer; Thomas W. Fox; Diego S. Gallo; Leopold Grinberg; John A. Gunnels; Arpith C. Jacob; P. Jacob; Hans M. Jacobson; Tejas Karkhanis; Choon Young Kim; Jaime H. Moreno; John Kevin Patrick O'Brien; Martin Ohmacht; Yoonho Park; Daniel A. Prener; Bryan S. Rosenburg; Kyung Dong Ryu; Olivier Sallenave; Mauricio J. Serrano; Patrick Siegl

Many studies point to the difficulty of scaling existing computer architectures to meet the needs of an exascale system (i.e., capable of executing

international test conference | 2014

Soft error resiliency characterization and improvement on IBM BlueGene/Q processor using accelerated proton irradiation

Chen-Yong Cher; K. Paul Muller; Ruud A. Haring; David L. Satterfield; Thomas E. Musta; Thomas M. Gooding; Kristan D. Davis; Marc Boris Dombrowa; Gerard V. Kopcsay; Robert M. Senger; Yutaka Sugawara; Krishnan Sugavanam

10^{18}

Ibm Journal of Research and Development | 2013

Design for low power and power management in IBM Blue Gene/Q

Krishnan Sugavanam; Chen-Yong Cher; John A. Gunnels; Ruud A. Haring; Philip Heidelberger; Hans M. Jacobson; Moyra K. McManus; D. P. Paulsen; David L. Satterfield; Yutaka Sugawara; Robert Walkup

floating-point operations per second), consuming no more than 20 MW in power, by around the year 2020. This paper outlines a new architecture, the Active Memory Cube, which reduces the energy of computation significantly by performing computation in the memory module, rather than moving data through large memory hierarchies to the processor core. The architecture leverages a commercially demonstrated 3D memory stack called the Hybrid Memory Cube, placing sophisticated computational elements on the logic layer below its stack of dynamic random-access memory (DRAM) dies. The paper also describes an Active Memory Cube tuned to the requirements of a scientific exascale system. The computational elements have a vector architecture and are capable of performing a comprehensive set of floating-point and integer instructions, predicated operations, and gather-scatter accesses across memory in the Cube. The paper outlines the software infrastructure used to develop applications and to evaluate the architecture, and describes results of experiments on application kernels, along with performance and power projections.

asia and south pacific design automation conference | 2014

Soft Error Resiliency Characterization on IBM BlueGene/Q Processor

Fault injection through accelerated irradiation is an effective way to evaluate the overall soft error resiliency of microprocessors. In this work, we report on irradiation experiments on a Blue Gene/Q (BG/Q) compute processor chip running selected applications. Blue Gene/Q is the third generation of IBMs massively parallel, energy efficient Blue Gene series of supercomputers. In the experiments, we found 69 code fails. Out of these, 26 code fails are relevant for the calculation of the mean-time-between-failures (MTBF) for a 20 PetaFLOP, 96 rack system running a comparable workload mix. The expected MTBF for check-stops due to cosmic radiation and alpha particles from chip packaging materials is calculated to be 51 days for sea-level at New York City running the application mix studied. If the most vulnerable application is run exclusively, the projected MTBF is 35 days. These are outstanding results for a machine of this magnitude. The beaming experiment and projected MTBF validate the necessity to include autonomous hardware detection and recovery at the cost of design effort, silicon area and power.

Archive | 2004

Evaluation of Large L3 Caches Using TPC-H Trace Samples

Jaeheon Jeong; Ramendra K. Sahoo; Krishnan Sugavanam; Ashwini K. Nanda; Michel Dubois

In this paper, we explain the techniques used in IBM Blue Gene®/Q Compute chips to achieve high energy efficiency. Architectural techniques include the choice of a power-efficient, throughput-oriented processor core with a SIMD (single-instruction, multiple-data) floating-point unit, as well as multiple frequency domains for moving data. Design techniques include clock gating and the use of multiple threshold voltage devices. From a systems perspective, power is reduced by using a speed binning technique that characterizes the manufacturing variability of chips during wafer test, permitting similar chips to be packaged on the same board and run at the lowest voltage possible. We describe the techniques used to monitor and manage the power and performance of the various subunits of the Blue Gene/Q chip. Details include the functioning of the environmental monitor and the performance counters. Using these facilities, we describe the framework to understand how the chips subunits contribute to the total active and leakage power consumed. A power characterization technique for the development of application-dependent power projection models is presented. Differences between estimated power before chip tape-out versus measured power are discussed.

Archive | 2011

Multi-petascale highly efficient parallel supercomputer

Sameh W. Asaad; Ralph E. Bellofatto; Michael A. Blocksome; Matthias A. Blumrich; Peter A. Boyle; Jose R. Brunheroto; Dong Chen; Chen Yong Cher; George L. Chiu; Norman H. Christ; Paul W. Coteus; Kristan D. Davis; Gabor J. Dozsa; Alexandre E. Eichenberger; Noel A. Eisley; Matthew R. Ellavsky; Kahn C. Evans; Bruce M. Fleischer; Thomas W. Fox; Alan Gara; Mark E. Giampapa; Thomas M. Gooding; Michael K. Gschwind; John A. Gunnels; Shawn A. Hall; Rudolf A. Haring; Philip Heidelberger; Todd A. Inglett; Brant L. Knudson; Gerard V. Kopcsay

Soft Error Resiliency (SER) is a major concern for Petascale high performance computing (HPC) systems. In designing Blue Gene/Q (BG/Q) [8], many mechanisms were deployed to target SER including extensive use of Silicon-On-Insulator (SOI), radiation-hardened latches [7,13], detection and correction in on-chip arrays, and very low radiation packaging materials. On the other hand, it is well known that application behavior has major impacts on the masking (or “derating” factor) in system SER calculations. The principal goal of this project is to understand the interaction between BG/Q hardware and high-performance applications when it comes to SER by performing and evaluating a chip irradiation experiment.

Archive | 2007

ULTRASCALABLE PETAFLOP PARALLEL SUPERCOMPUTER

Matthias A. Blumrich; Dong Chen; George Liang-Tai Chiu; Thomas Mario Cipolla; Paul W. Coteus; Alan Gara; Mark E. Giampapa; Shawn A. Hall; Rudolf A. Haring; Philip Heidelberger; Gerard V. Kopcsay; Martin Ohmacht; Valentina Salapura; Krishnan Sugavanam; Todd E. Takken

In this chapter we evaluate the miss rates of four L3 cache architectures for small-scale multiprocessors. Eight processors are partitioned into 1, 2, 4, or 8 clusters with 8, 4, 2, or 1 processors, respectively. Each cluster has a large L3 cache, and the aggregate amount of L3 cache in each of the four architectures varies between 64 MB and 1 GB. The target of our evaluations is decision support systems. We use bus trace samples obtained during the execution of a 100 GB TPC-H on an 8-way multiprocessor. These 12 time samples were taken at one hour intervals during the first day of execution of TPC-H. Each sample contains 64 M bus references.

Archive | 2010