Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Ronald Nick Kalla is active.

Publication


Featured researches published by Ronald Nick Kalla.


Ibm Journal of Research and Development | 2005

POWER5 System microarchitecture

Balaram Sinharoy; Ronald Nick Kalla; Joel M. Tendler; Richard J. Eickemeyer; Jody B. Joyner

The IBM POWER4 is a new microprocessor organized in a system structure that includes new technology to form systems. The name POWER4 as used in this context refers not only to a chip, but also to the structure used to interconnect chips to form systems. In this paper we describe the processor microarchitecture as well as the interconnection architecture employed to form systems up to a 32-way symmetric multiprocessor.


IEEE Micro | 2004

IBM Power5 chip: a dual-core multithreaded processor

Ronald Nick Kalla; Balaram Sinharoy; Joel M. Tendler

IBM introduced Power4-based systems in 2001. The Power4 design integrates two processor cores on a single chip, a shared second-level cache, a directory for an off-chip third-level cache, and the necessary circuitry to connect it to other Power4 chips to form a system. The dual-processor chip provides natural thread-level parallelism at the chip level. The Power5 is the next-generation chip in this line. One of our key goals in designing the Power5 was to maintain both binary and structural compatibility with existing Power4 systems to ensure that binaries continue executing properly and all application optimizations carry forward to newer systems. With that base requirement, we specified increased performance and other functional enhancements of server virtualization, reliability, availability, and serviceability at both chip and system levels. We describe the approach we used to improve chip-level performance.


international symposium on microarchitecture | 2010

Power7: IBM's Next-Generation Server Processor

Ronald Nick Kalla; Balaram Sinharoy; William J. Starke; Michael Stephen Floyd

Power Systems™ continue strong 7th Generation Power chip: Balanced Multi-Core design EDRAM technology SMT4 Greater then 4X performance in same power envelope as previous generation. Scales to 32 socket, 1024 threads balanced system. Building block for peta-scale PERCS project POWER7 Systems Running in Lab AIX®, IBM i, Linux® all operational.


Ibm Journal of Research and Development | 2011

IBM POWER7 multicore server processor

Balaram Sinharoy; Ronald Nick Kalla; William J. Starke; Hung Q. Le; R. Cargnoni; J. A. Van Norstrand; B. J. Ronchetti; Jeffrey A. Stuecheli; Jens Leenstra; G. L. Guthrie; D. Q. Nguyen; Bart Blaner; C. F. Marino; E. Retter; Peter Williams

The IBM POWER® processor is the dominant reduced instruction set computing microprocessor in the world today, with a rich history of implementation and innovation over the last 20 years. In this paper, we describe the key features of the POWER7® processor chip. On the chip is an eight-core processor, with each core capable of four-way simultaneous multithreaded operation. Fabricated in IBMs 45-nm silicon-on-insulator (SOI) technology with 11 levels of metal, the chip contains more than one billion transistors. The processor core and caches are significantly enhanced to boost the performance of both single-threaded response-time-oriented, as well as multithreaded, throughput-oriented applications. The memory subsystem contains three levels of on-chip cache, with SOI embedded dynamic random access memory (DRAM) devices used as the last level of cache. A new memory interface using buffered double-data-rate-three DRAM and improvements in reliability, availability, and serviceability are discussed


Ibm Journal of Research and Development | 2008

Soft-error resilience of the IBM POWER6 processor

Pia N. Sanda; Jeffrey W. Kellington; Prabhakar Kudva; Ronald Nick Kalla; Ryan B. McBeth; Jerry D. Ackaret; Ryan Lockwood; John Schumann; Christopher R. Jones

The error detection and correction capability of the IBM POWER6™ processor enables high tolerance to single-event upsets. The soft-error resilience was tested with proton beam- and neutron beam-induced fault injection. Additionally, statistical fault injection was performed on a hardware-emulated POWER6 processor simulation model. The error resiliency is described in terms of the proportion of latch upset events that result in vanished errors, corrected errors, checkstops, and incorrect architected states.


international solid-state circuits conference | 2010

The implementation of POWER7 TM : A highly parallel and scalable multi-core high-end server processor

Dieter Wendel; Ronald Nick Kalla; Robert Cargoni; Joachim Clables; Joshua Friedrich; Roland Frech; James Allan Kahle; Balaram Sinharoy; William J. Starke; Scott A. Taylor; Steve Weitzel; Sam Gat-Shang Chu; Saiful Islam; Victor Zyuban

The next processor of the POWER ™ family, called POWER7™ is introduced. Eight quad-threaded cores are integrated together with two memory controllers and high-speed system links on a 567mm2 die, employing 1.2B transistors in 45nm CMOS SOI technology [4]. High on-chip performance and therefore bandwidth is achieved using 11 layers of low-к copper wiring and devices with enhanced dual-stress liners. The technology features deep trench [DT] capacitors that are used to build the 32MB embedded DRAM L3 based on a 0.067µm2 DRAM cell. DT capacitors are used also to reduce on-chip voltage-island supply noise. Focusing on speed, the dual-supply ripple-domino SRAM concepts follows the schemes described elsewhere.


international conference on ic design and technology | 2004

Design and implementation of the POWER5/spl trade/ microprocessor

Joachim Gerhard Clabes; Joshua Friedrich; Mark Sweet; Jack DiLullo; Sam Gat-Shang Chu; Donald W. Plass; J. Dawson; P. Muench; L. Powell; M. Floyd; Balaram Sinharoy; M. Lee; M. Goulet; J. Wagoner; N. Schwartz; S. Runyon; G. Gorman; Phillip J. Restle; Ronald Nick Kalla; J. McGill; S. Dodson

POWER5/sup TM/ is the next generation of IBMs POWER microprocessors. This design, sets a new standard of server performance by incorporating simultaneous multithreading (SMT), an enhanced distributed switch and memory subsystem supporting 164w SMP, and extensive RAS support. First pass hardware using IBMs 130nm silicon-on-insulator technology operates above 1.5GHz at 1.3V. POWER5s dual-threaded SMT creates up to two virtual processors per core, improving execution unit utilization and masking memory latency. Although a simplistic SMT implementation promised /spl sim/20% performance improvement, resizing critical microarchitectural resources almost doubles in many cases the SMT performance benefit at a 24% area. Implementing these microarchitectural enhancements posed challenges in meeting the chips frequency, area, power, and thermal targets.


Ibm Journal of Research and Development | 2013

IBM POWER7+ processor on-chip accelerators for cryptography and active memory expansion

Bart Blaner; Bulent Abali; Brian Mitchell Bass; Suresh Chari; Ronald Nick Kalla; Steven R. Kunkel; Kenneth A. Lauricella; Ross Boyd Leavens; John J. Reilly; Peter A. Sandon

With the heightened focus on computer security, IBM POWER® server workloads are spending an increasing number of cycles performing cryptographic functions. Active memory expansion (AME), a technology to dynamically increase the effective memory capacity of a system by compressing and decompressing memory pages, is also enjoying increasing deployment in POWER server systems. Together, cryptography and AME consume enough central processing unit (CPU) cycles in a typical installation to warrant adding dedicated hardware accelerators on the processor chip to offload the compute-intensive parts of these functions from the processor cores. IBM POWER7+™ is the first POWER server to include on-chip hardware accelerators for symmetric (shared key) and asymmetric (public key) cryptography and memory compression/decompression for AME. A true random number generator (RNG) is also integrated on-chip. This paper describes the hardware accelerator framework, including location relative to the cores and memory, accelerator invocation, data movement, and error handling. A description of each type of accelerator follows, including details of supported algorithms and the corresponding hardware data flows. Algorithms supported include the Advanced Encryption Standard, Secure Hash Algorithm, and Message Digest 5 algorithm as bulk cryptographic functions; asymmetric cryptographic functions in support of RSA and elliptic curve cryptography; and a novel dictionary-based compression algorithm with high throughput supporting AME. A presentation of accelerator performance is included.


Ibm Journal of Research and Development | 2015

Debugging post-silicon fails in the IBM POWER8 bring-up lab

Manoj Dusanapudi; S. Fields; Michael Stephen Floyd; Guy Lynn Guthrie; Ronald Nick Kalla; Shakti Kapoor; Larry Scott Leitner; C. F. Marino; Joseph McGill; Amir Nahir; Kevin Franklin Reick; Hugh Shen; Kenneth L. Wright

Debugging post-silicon fails continues to be a difficult problem that is becoming even more challenging as chips integrate more functionality and implement increasingly complicated functions. Additionally, the complexity of hardware systems, coupled with the difficulty in observing the state of the system that led to the failure, make the debugging effort a unique challenge. In this paper, we review the techniques and mechanisms used to facilitate effective debugging in the POWER8™ processor post-silicon validation phase. We further describe several functional bugs and describe the debugging process that drove the identification of their root cause.


international conference on ic design and technology | 2010

The power7 TM processor SoC

Dieter Wendel; Ronald Nick Kalla; Joshua Friedrich; James Allan Kahle; Jens Leenstra; Cedric Lichtenau; Balaram Sinharoy; William J. Starke; Victor Zyuban

Introducing POWER7TM the latest member of the IBM POWERTM processor family. A 567mm² chip implemented in 45nm SOI technology, holding eight quad threaded cores, a 32MB shared eDRAM L3, two memory controllers and high bandwidth SMP interfaces. The new out of order, shallow pipeline core with 12 execution units, multiport L1 caches and a private 256kB L2 offers the efficiency to support 4x the number of cores within the same power envelope as its predecessor. Supporting over 4GHz, the L1 data cache loop is kept to 2 cycles. Data from the L2 can be returned to the core at a rate of 32B per cycle.

Researchain Logo
Decentralizing Knowledge