Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Balaram Sinharoy is active.

Publication


Featured researches published by Balaram Sinharoy.


Ibm Journal of Research and Development | 2005

POWER5 System microarchitecture

Balaram Sinharoy; Ronald Nick Kalla; Joel M. Tendler; Richard J. Eickemeyer; Jody B. Joyner

The IBM POWER4 is a new microprocessor organized in a system structure that includes new technology to form systems. The name POWER4 as used in this context refers not only to a chip, but also to the structure used to interconnect chips to form systems. In this paper we describe the processor microarchitecture as well as the interconnection architecture employed to form systems up to a 32-way symmetric multiprocessor.


IEEE Micro | 2004

IBM Power5 chip: a dual-core multithreaded processor

Ronald Nick Kalla; Balaram Sinharoy; Joel M. Tendler

IBM introduced Power4-based systems in 2001. The Power4 design integrates two processor cores on a single chip, a shared second-level cache, a directory for an off-chip third-level cache, and the necessary circuitry to connect it to other Power4 chips to form a system. The dual-processor chip provides natural thread-level parallelism at the chip level. The Power5 is the next-generation chip in this line. One of our key goals in designing the Power5 was to maintain both binary and structural compatibility with existing Power4 systems to ensure that binaries continue executing properly and all application optimizations carry forward to newer systems. With that base requirement, we specified increased performance and other functional enhancements of server virtualization, reliability, availability, and serviceability at both chip and system levels. We describe the approach we used to improve chip-level performance.


international symposium on microarchitecture | 2010

Power7: IBM's Next-Generation Server Processor

Ronald Nick Kalla; Balaram Sinharoy; William J. Starke; Michael Stephen Floyd

Power Systems™ continue strong 7th Generation Power chip: Balanced Multi-Core design EDRAM technology SMT4 Greater then 4X performance in same power envelope as previous generation. Scales to 32 socket, 1024 threads balanced system. Building block for peta-scale PERCS project POWER7 Systems Running in Lab AIX®, IBM i, Linux® all operational.


Ibm Journal of Research and Development | 2011

IBM POWER7 multicore server processor

Balaram Sinharoy; Ronald Nick Kalla; William J. Starke; Hung Q. Le; R. Cargnoni; J. A. Van Norstrand; B. J. Ronchetti; Jeffrey A. Stuecheli; Jens Leenstra; G. L. Guthrie; D. Q. Nguyen; Bart Blaner; C. F. Marino; E. Retter; Peter Williams

The IBM POWER® processor is the dominant reduced instruction set computing microprocessor in the world today, with a rich history of implementation and innovation over the last 20 years. In this paper, we describe the key features of the POWER7® processor chip. On the chip is an eight-core processor, with each core capable of four-way simultaneous multithreaded operation. Fabricated in IBMs 45-nm silicon-on-insulator (SOI) technology with 11 levels of metal, the chip contains more than one billion transistors. The processor core and caches are significantly enhanced to boost the performance of both single-threaded response-time-oriented, as well as multithreaded, throughput-oriented applications. The memory subsystem contains three levels of on-chip cache, with SOI embedded dynamic random access memory (DRAM) devices used as the last level of cache. A new memory interface using buffered double-data-rate-three DRAM and improvements in reliability, availability, and serviceability are discussed


international solid-state circuits conference | 2010

The implementation of POWER7 TM : A highly parallel and scalable multi-core high-end server processor

Dieter Wendel; Ronald Nick Kalla; Robert Cargoni; Joachim Clables; Joshua Friedrich; Roland Frech; James Allan Kahle; Balaram Sinharoy; William J. Starke; Scott A. Taylor; Steve Weitzel; Sam Gat-Shang Chu; Saiful Islam; Victor Zyuban

The next processor of the POWER ™ family, called POWER7™ is introduced. Eight quad-threaded cores are integrated together with two memory controllers and high-speed system links on a 567mm2 die, employing 1.2B transistors in 45nm CMOS SOI technology [4]. High on-chip performance and therefore bandwidth is achieved using 11 layers of low-к copper wiring and devices with enhanced dual-stress liners. The technology features deep trench [DT] capacitors that are used to build the 32MB embedded DRAM L3 based on a 0.067µm2 DRAM cell. DT capacitors are used also to reduce on-chip voltage-island supply noise. Focusing on speed, the dual-supply ripple-domino SRAM concepts follows the schemes described elsewhere.


Ibm Journal of Research and Development | 2015

IBM POWER8 processor core microarchitecture

Balaram Sinharoy; J. A. Van Norstrand; R. J. Eickemeyer; Hung Q. Le; Jens Leenstra; D. Q. Nguyen; B. Konigsburg; K. Ward; M. D. Brown; J. E. Moreira; D. Levitan; S. Tung; D. Hrusecky; J. W. Bishop; M. Gschwind; M. Boersma; M. Kroener; M. Kaltenbach; T. Karkhanis; K. M. Fernsler

The POWER8™ processor is the latest RISC (Reduced Instruction Set Computer) microprocessor from IBM. It is fabricated using the companys 22-nm Silicon on Insulator (SOI) technology with 15 layers of metal, and it has been designed to significantly improve both single-thread performance and single-core throughput over its predecessor, the POWER7® processor. The rate of increase in processor frequency enabled by new silicon technology advancements has decreased dramatically in recent generations, as compared to the historic trend. This has caused many processor designs in the industry to show very little improvement in either single-thread or single-core performance, and, instead, larger numbers of cores are primarily pursued in each generation. Going against this industry trend, the POWER8 processor relies on a much improved core and nest microarchitecture to achieve approximately one-and-a-half times the single-thread performance and twice the single-core throughput of the POWER7 processor in several commercial applications. Combined with a 50% increase in the number of cores (from 8 in the POWER7 processor to 12 in the POWER8 processor), the result is a processor that leads the industry in performance for enterprise workloads. This paper describes the core microarchitecture innovations made in the POWER8 processor that resulted in these significant performance benefits.


IEEE Journal of Solid-state Circuits | 2011

POWER7™, a Highly Parallel, Scalable Multi-Core High End Server Processor

Dieter Wendel; R Kalla; James D. Warnock; R. Cargnoni; S G Chu; J G Clabes; Daniel M. Dreps; D. Hrusecky; Joshua Friedrich; Saiful Islam; J Kahle; Jens Leenstra; Gaurav Mittal; Jose Angel Paredes; Jürgen Pille; Phillip J. Restle; Balaram Sinharoy; G Smith; W J Starke; S Taylor; J. A. Van Norstrand; Stephen Douglas Weitzel; P G Williams; Victor Zyuban

This paper gives an overview of the latest member of the POWER™ processor family, POWER7™. Eight quad-threaded cores, operating at frequencies up to 4.14 GHz, are integrated together with two memory controllers and high speed system links on a 567 mm die, employing 1.2B transistors in a 45 nm CMOS SOI technology with 11 layers of low-k copper wiring. The technology features deep trench capacitors which are used to build a 32 MB embedded DRAM L3 based on a 0.067 m DRAM cell. The functionally equivalent chip transistor count would have been over 2.7B if the L3 had been implemented with a conventional 6 transistor SRAM cell. (A detailed paper about the eDRAM implementation will be given in a separate paper of this Journal). Deep trench capacitors are also used to reduce on-chip voltage island supply noise. This paper describes the organization of the design and the features of the processor core, before moving on to discuss the circuits used for analog elements, clock generation and distribution, and I/O designs. The final section describes the details of the clocked storage elements, including special features for test, debug, and chip frequency tuning.


high-performance computer architecture | 2005

Stretching the limits of clock-gating efficiency in server-class processors

Hans M. Jacobson; Pradip Bose; Zhigang Hu; Alper Buyuktosunoglu; Victor Zyuban; Richard James Eickemeyer; Lee Evan Eisen; John Barry Griswell; Doug Logan; Balaram Sinharoy; Joel M. Tendler

Clock-gating has been introduced as the primary means of dynamic power management in recent high-end commercial microprocessors. The temperature drop resulting from active power reduction can result in additional leakage power savings in future processors. In this paper we first examine the realistic benefits and limits of clock-gating in current generation high-performance processors (e.g. of the POWER4/spl trade/ or POWER5/spl trade/ class). We then look beyond classical clock-gating: we examine additional opportunities to avoid unnecessary clocking in real workload executions. In particular, we examine the power reduction benefits of a couple of newly invented schemes called transparent pipeline clock-gating and elastic pipeline clock-gating. Based on our experiences with current designs, we try to bound the practical limits of clock gating efficiency in future microprocessors.


international symposium on low power electronics and design | 2002

A microarchitectural-level step-power analysis tool

Wael El-Essawy; David H. Albonesi; Balaram Sinharoy

Clock gating is an effective means for reducing average power consumption. However, clock gating can exacerbate maximum cycle-to-cycle current swings, or the step-power (Ldi/dt) problem. We present a microarchitecture-level step-power simulator and demonstrate its use in exploring how design alternatives impact relative step-power levels. We show how the tool can be used to identify major sources of high microprocessor step-power events. Our experiments indicate that branch mispredictions are a major cause of high step-power occurrences. We also show that high step-power events are infrequent which suggest that architectural techniques may limit step-power at potentially low performance cost.


Ibm Journal of Research and Development | 2011

IBM POWER7 performance modeling, verification, and evaluation

M. Srinivas; Balaram Sinharoy; Richard J. Eickemeyer; Ram Raghavan; Steven R. Kunkel; Tien Chi Chen; W. Maron; D. Flemming; A. Blanchard; P. Seshadri; Jeffrey W. Kellington; Alex E. Mericas; A. E. Petruski; V. R. Indukuru; S. Reyes

In this paper, we describe the key performance enhancements in IBM POWER7® microarchitecture and its memory hierarchy, including performance modeling and verification methodology. We also describe the performance characteristics of server applications, including Standard Performance Evaluation Corporation (SPEC) central processing unit, SAP Sales and Distribution, SPECjbb, online transaction processing workloads, and high-performance computing applications running on POWER7 processor-based systems compared with other systems.

Collaboration


Dive into the Balaram Sinharoy's collaboration.

Researchain Logo
Decentralizing Knowledge