Premkishore Shivakumar

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Premkishore Shivakumar is active.

Explore More

Publication

Featured researches published by Premkishore Shivakumar.

dependable systems and networks | 2002

Modeling the effect of technology trends on the soft error rate of combinational logic

Premkishore Shivakumar; Michael Kistler; Stephen W. Keckler; Doug Burger; Lorenzo Alvisi

This paper examines the effect of technology scaling and microarchitectural trends on the rate of soft errors in CMOS memory and logic circuits. We describe and validate an end-to-end model that enables us to compute the soft error rates (SER) for existing and future microprocessor-style designs. The model captures the effects of two important masking phenomena, electrical masking and latching-window masking, which inhibit soft errors in combinational logic. We quantify the SER due to high-energy neutrons in SRAM cells, latches, and logic circuits for feature sizes from 600 nm to 50 nm and clock periods from 16 to 6 fan-out-of-4 inverter delays. Our model predicts that the SER per chip of logic circuits will increase nine orders of magnitude from 1992 to 2011 and at that point will be comparable to the SER per chip of unprotected memory elements. Our result emphasizes that computer system designers must address the risks of soft errors in logic circuits for future designs.

international symposium on computer architecture | 2002

The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays

M. S. Hrishikesh; Doug Burger; Norman P. Jouppi; Stephen W. Keckler; Keith I. Farkas; Premkishore Shivakumar

Microprocessor clock frequency has improved by nearly 40% annually over the past decade. This improvement has been provided, in equal measure, by smaller technologies and deeper pipelines. From our study of the SPEC 2000 benchmarks, we find that for a high-performance architecture implemented in 100nm technology, the optimal clock period is approximately 8 fan-out-of-four (FO4) inverter delays for integer benchmarks, comprised of 6 FO4 of useful work and an overhead of about 2 FO4. The optimal clock period for floating-point benchmarks is 6 FO4. We find these optimal points to be insensitive to latch and clock skew overheads. Our study indicates that further pipelining can at best improve performance of integer programs by a factor of 2 over current designs. At these high clock frequencies it will be difficult to design the instruction issue window to operate in a single cycle. Consequently, we propose and evaluate a high-frequency design called a segmented instruction window.

international conference on computer design | 2003

Exploiting microarchitectural redundancy for defect tolerance

Premkishore Shivakumar; Stephen W. Keckler; Charles R. Moore; Doug Burger

The continued increase in microprocessor clock frequency that has come from advancements in fabrication technology and reductions in feature size, creates challenges in maintaining both manufacturing yield rates and long-term reliability of devices. Methods based on defect detection and reduction may not offer a scalable solution due to cost of eliminating contaminants in the manufacturing process and increasing chip complexity. This paper proposes to use the inherent redundancy available in existing and future chip microarchitectures to improve yield and enable graceful performance degradation in fail-in-place systems. We introduce a new yield metric called performance averaged yield (Ypav) which accounts both for fully functional chips and those that exhibit some performance degradation. Our results indicate that at 250nm we are able to increase the Ypav of a uniprocessor with only redundant rows in its caches from a base value of 85% to 98% using microarchitectural redundancy. Given constant chip area, shrinking feature sizes increases fault susceptibility and reduces the base Ypav to 60% at 50nm, which exploiting microarchitectural redundancy then increases to 99.6%.

international symposium on microarchitecture | 2006

Distributed Microarchitectural Protocols in the TRIPS Prototype Processor

Karthikeyan Sankaralingam; Ramadass Nagarajan; Robert McDonald; Rajagopalan Desikan; S. Drolia; Madhu Saravana Sibi Govindan; P. Gratzf; Divya P. Gulati; Heather Hanson; Changkyu Kim; Haiming Liu; Nitya Ranganathan; Simha Sethumadhavan; S. Shariff; Premkishore Shivakumar; Stephen W. Keckler; Doug Burger

Growing on-chip wire delays will cause many future microarchitectures to be distributed, in which hardware resources within a single processor become nodes on one or more switched micronetworks. Since large processor cores will require multiple clock cycles to traverse, control must be distributed, not centralized. This paper describes the control protocols in the TRIPS processor, a distributed, tiled microarchitecture that supports dynamic execution. It details each of the five types of reused tiles that compose the processor, the control and data networks that connect them, and the distributed microarchitectural protocols that implement instruction fetch, execution, flush, and commit. We also describe the physical design issues that arose when implementing the microarchitecture in a 170M transistor, 130nm ASIC prototype chip composed of two 16-wide issue distributed processor cores and a distributed 1MB non-uniform (NUCA) on-chip memory system

international symposium on microarchitecture | 2007

On-Chip Interconnection Networks of the TRIPS Chip

Paul V. Gratz; Changkyu Kim; Karthikeyan Sankaralingam; Heather Hanson; Premkishore Shivakumar; Stephen W. Keckler; Doug Burger

The TRIPS chip prototypes two networks on chip to demonstrate the viability of a routed interconnection fabric for memory and operand traffic. In a 170-million-transistor custom ASIC chip, these NoCs provide system performance within 28 percent of ideal noncontended networks at a cost of 20 percent of the die area. our experience shows that NoCs are area- and complexity-efficient means of providing high-bandwidth, low-latency on-chip communication.

networks on chips | 2007

Implementation and Evaluation of a Dynamically Routed Processor Operand Network

Paul V. Gratz; Karthikeyan Sankaralingam; Heather Hanson; Premkishore Shivakumar; Robert McDonald; Stephen W. Keckler; Doug Burger

Microarchitecturally integrated on-chip networks, or micronets, are candidates to replace busses for processor component interconnect in future processor designs. For micronets, tight coupling between processor microarchitecture and network architecture is one of the keys to improving processor performance. This paper presents the design, implementation and evaluation of the TRIPS operand network (OPN). The TRIPS OPN is a 5times5, dynamically routed, 2D mesh micronet that is integrated into the TRIPS microprocessor core. The TRIPS OPN is used for operand passing, register file I/O, and primary memory system I/O. We discuss in detail the OPN design, including the unique features that arise from its integration with the processor core, such as its connection to the execution units wakeup pipeline and its in flight mis-speculated traffic removal. We then evaluate the performance of the network under synthetic and realistic loads. Finally, we assess the processor performance implications of OPN design decisions with respect to the end-to-end latency of OPN packets and the OPNs bandwidth

international solid-state circuits conference | 2003

A wire-delay scalable microprocessor architecture for high performance systems

Stephen W. Keckler; Doug Burger; Charles R. Moore; Ramadass Nagarajan; Karthikeyan Sankaralingam; Vikas Agarwal; M. S. Hrishikesh; Nitya Ranganathan; Premkishore Shivakumar

This scalable processor architecture consists of chained ALUs to minimize the physical distance between dependent instructions, thus mitigating the effect of long on-chip wire delays. Simulation studies demonstrate 1.3-15/spl times/ more instructions per clock than conventional superscalar architectures.

Archive | 2001