Vimal K. Reddy
Qualcomm
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Vimal K. Reddy.
architectural support for programming languages and operating systems | 2006
Vimal K. Reddy; Eric Rotenberg; Sailashri Parthasarathy
Redundant threading architectures duplicate all instructions to detect and possibly recover from transient faults. Several lighter weight Partial Redundant Threading (PRT) architectures have been proposed recently. (i) Opportunistic Fault Tolerance duplicates instructions only during periods of poor single-thread performance. (ii) ReStore does not explicitly duplicate instructions and instead exploits mispredictions among highly confident branch predictions as symptoms of faults. (iii) Slipstream creates a reduced alternate thread by replacing many instructions with highly confident predictions. We explore PRT as a possible direction for achieving the fault tolerance of full duplication with the performance of single-thread execution. Opportunistic and ReStore yield partial coverage since they are restricted to using only partial duplication or only confident predictions, respectively. Previous analysis of Slipstream fault tolerance was cursory and concluded that only duplicated instructions are covered. In this paper, we attempt to better understand Slipstreams fault tolerance, conjecturing that the mixture of partial duplication and confident predictions actually closely approximates the coverage of full duplication. A thorough dissection of prediction scenarios confirms that faults in nearly 100% of instructions are detectable. Fewer than 0.1% of faulty instructions are not detectable due to coincident faults and mispredictions. Next we show that the current recovery implementation fails to leverage excellent detection capability, since recovery sometimes initiates belatedly, after already retiring a detected faulty instruction. We propose and evaluate a suite of simple microarchitectural alterations to recovery and checking. Using the best alterations, Slipstream can recover from faults in 99% of instructions, compared to only 78% of instructions without alterations. Both results are much higher than predicted by past research, which claims coverage for only duplicated instructions, or 65% of instructions. On an 8-issue SMT processor, Slipstream performs within 1.3% of single-thread execution whereas full duplication slows performance by 14%.A key byproduct of this paper is a novel analysis framework in which every dynamic instruction is considered to be hypothetically faulty, thus not requiring explicit fault injection. Fault coverage is measured in terms of the fraction of candidate faulty instructions that are directly or indirectly detectable before.
dependable systems and networks | 2008
Vimal K. Reddy; Eric Rotenberg
Conventional processor fault tolerance based on time/space redundancy is robust but prohibitively expensive for commodity processors. This paper explores an unconventional approach to designing a cost-effective fault-tolerant superscalar processor. The idea is to engage a regimen of microarchitecture-level fault checks. A few simple microarchitecture-level fault checks can detect many arbitrary faults in large units, by observing microarchitecture-level behavior and anomalies in this behavior. Previously, we separately proposed checks for the fetch and decode stages, rename stage, and issue stage of a contemporary superscalar processor. While each piece hinted at the possibility of a complete regimen - for an overall fault-tolerant superscalar processor - this totality was not explored. This paper provides the culmination by building a full regimen into a superscalar processor. We show for the first time that the regimen-based approach provides substantial coverage of an entire superscalar processor. Analysis reveals vulnerable areas which should be the focus for regimen additions.
dependable systems and networks | 2007
Vimal K. Reddy; Eric Rotenberg
A new approach is proposed that exploits repetition inherent in programs to provide low-overhead transient fault protection in a processor. Programs repeatedly execute the same instructions within close time periods. This can be viewed as a time redundant re-execution of a program, except that inputs to these inherent time redundant (ITR) instructions vary. Nevertheless, certain microarchitectural events in the processor are independent of the input and only depend on the program instructions. Such events can be recorded and confirmed when ITR instructions repeat. In this paper, we use ITR to detect transient faults in the fetch and decode units of a processor pipeline, avoiding costly approaches like structural duplication or explicit time redundant execution.
international symposium on computer architecture | 2007
Ahmed S. Al-Zawawi; Vimal K. Reddy; Eric Rotenberg; Haitham Akkary
Archive | 2009
Vimal K. Reddy; Mike W. Morrow
Archive | 2016
Vimal K. Reddy; Niket Kumar Choundhary; Michael Scott McIlvaine; Daren Eugene Streett; Robert Douglas Clancy; James Norris Dieffenderfer; Michael William Morrow
Archive | 2014
Michael William Morrow; James Norris Dieffenderfer; Thomas Andrew Sartorius; Michael Scott McIlvaine; Brian Michael Stempel; Daren Eugene Streett; Vimal K. Reddy
Archive | 2013
James Norris Dieffenderfer; Michael William Morrow; Michael Scott McIlvaine; Daren Eugene Streett; Vimal K. Reddy; Brian Michael Stempel
Archive | 2014
Vimal K. Reddy; Niket Kumar Choudhary; Michael William Morrow
Archive | 2007
Eric Rotenberg; Vimal K. Reddy