Gurindar S. Sohi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Gurindar S. Sohi is active.

Explore More

Publication

Featured researches published by Gurindar S. Sohi.

programming language design and implementation | 1994

Efficient detection of all pointer and array access errors

Todd M. Austin; Scott E. Breach; Gurindar S. Sohi

We present a pointer and array access checking technique that provides complete error coverage through a simple set of program transformations. Our technique, based on an extended safe pointer representation, has a number of novel aspects. Foremost, it is the first technique that detects all spatial and temporal access errors. Its use is not limited by the expressiveness of the language; that is, it can be applied successfully to compiled or interpreted languages with subscripted and mutable pointers, local references, and explicit and typeless dynamic storage management, e.g., C. Because it is a source level transformation, it is amenable to both compile- and run-time optimization. Finally, its performance, even without compile-time optimization, is quite good. We implemented a prototype translator for the C language and analyzed the checking overheads of six non-trivial, pointer intensive programs. Execution overheads range from 130% to 540%; with text and data size overheads typically below 100%.

IEEE Transactions on Computers | 1990

Instruction issue logic for high-performance, interruptible, multiple functional unit, pipelined computers

Gurindar S. Sohi

The problems of data dependency resolution and precise interrupt implementation in pipelined processors are combined. A design for a hardware mechanism that resolves dependencies dynamically and, at the same time, guarantees precise interrupts is presented. Simulation studies show that by resolving dependencies the proposed mechanism is able to obtain a significant speedup over a simple instruction issue mechanism as well as implement precise interrupts. >

high-performance computer architecture | 1998

Speculative versioning cache

Sridhar Gopal; T. N. Vijaykumar; James E. Smith; Gurindar S. Sohi

Dependences among loads and stores whose addresses are unknown hinder the extraction of instruction level parallelism during the execution of a sequential program. Such ambiguous memory dependences can be overcome by memory dependence speculation which enables a load or store to be speculatively executed before the addresses of all preceding loads and stores are known. Furthermore, multiple speculative stores to a memory location create multiple speculative versions of the location. Program order among the speculative versions must be tracked to maintain sequential semantics. A previously proposed approach, the address resolution buffer (ARB) uses a centralized buffer to support speculative versions. Our proposal, called the speculative versioning cache (SVC), uses distributed caches to eliminate the latency and bandwidth problems of the ARB. The SVC conceptually unifies cache coherence and speculative versioning by using an organization similar to snooping bus-based coherent caches. A preliminary evaluation for the multiscalar architecture shows that hit latency is an important factor affecting performance, and private cache solutions trade-off hit rate for hit latency.

international symposium on computer architecture | 1997

Dynamic instruction reuse

Avinash Sodani; Gurindar S. Sohi

This paper introduces the concept of dynamic instruction reuse. Empirical observations suggest that many instructions, and groups of instructions, having the same inputs, are executed dynamically. Such instructions do not have to be executed repeatedly --- their results can be obtained from a buffer where they were saved previously. This paper presents three hardware schemes for exploiting the phenomenon of dynamic instruction reuse, and evaluates their effectiveness using execution-driven simulation. We find that in some cases over 50% of the instructions can be reused. The speedups so obtained, though less striking than the percentage of instructions reused, are still quite significant.

international symposium on computer architecture | 1997

Dynamic speculation and synchronization of data dependences

Andreas Moshovos; Scott E. Breach; T. N. Vijaykumar; Gurindar S. Sohi

Data dependence speculation is used in instruction-level parallel (ILP) processors to allow early execution of an instruction before a logically preceding instruction on which it may be data dependent. If the instruction is independent, data dependence speculation succeeds; if not, it fails, and the two instructions must be synchronized. The modern dynamically scheduled processors that use data dependence speculation do so blindly (i.e., every load instruction with unresolved dependences is speculated). In this paper, we demonstrate that as dynamic instruction windows get larger, significant performance benefits can result when intelligent decisions about data dependence speculation are made. We propose dynamic data dependence speculation techniques: (i) to predict if the execution of an instruction is likely to result in a data dependence mis-specalation, and (ii) to provide the synchronization needed to avoid a mis-speculation. Experimental results evaluating the effectiveness of the proposed techniques are presented within the context of a Multiscalar processor.

Proceedings of the IEEE | 1995

The microarchitecture of superscalar processors

James E. Smith; Gurindar S. Sohi

Superscalar processing is the latest in along series of innovations aimed at producing ever-faster microprocessors. By exploiting instruction-level parallelism, superscalar processors are capable of executing more than one instruction in a clock cycle. This paper discusses the microarchitecture of superscalar processors. We begin with a discussion of the general problem solved by superscalar processors: converting an ostensibly sequential program into a more parallel one. The principles underlying this process, and the constraints that must be met, are discussed. The paper then provides a description of the specific implementation techniques used in the important phases of superscalar processing. The major phases include: (1) instruction fetching and conditional branch processing, (2) the determination of data dependences involving register values, (3) the initiation, or issuing, of instructions for parallel execution, (4) the communication of data values through memory via loads and stores, and (5) committing the process state in correct order so that precise interrupts can be supported. Examples of recent superscalar microprocessors, the MIPS R10000, the DEC 21164, and the AMD K5 are used to illustrate a variety of superscalar methods.

architectural support for programming languages and operating systems | 1998

Dependence based prefetching for linked data structures

Amir Roth; Andreas Moshovos; Gurindar S. Sohi

We introduce a dynamic scheme that captures the accesspat-terns of linked data structures and can be used to predict future accesses with high accuracy. Our technique exploits the dependence relationships that exist between loads that produce addresses and loads that consume these addresses. By identzj+ing producer-consumer pairs, we construct a compact internal representation for the associated structure and its traversal. To achieve a prefetching eflect, a small prefetch engine speculatively traverses this representation ahead of the executing program. Dependence-based prefetching achieves speedups of up to 25% on a suite of pointer-intensive programs.

international symposium on computer architecture | 2001

Execution-based prediction using speculative slices

Craig B. Zilles; Gurindar S. Sohi

A relatively small set of static instructions has significant leverage on program execution performance. These problem instructions contribute a disproportionate number of cache misses and branch mispredictions because their behavior cannot be accurately anticipated using existing prefetching or branch prediction mechanisms. The behavior of many problem instructions can be predicted by executing a small code fragment called a speculative slice. If a speculative slice is executed before the corresponding problem instructions are fetched, then the problem instructions can move smoothly through the pipeline because the slice has tolerated the latency of the memory hierarchy (for loads) or the pipeline (for branches). This technique results in speedups up to 43 percent over an aggressive baseline machine. To benefit from branch predictions generated by speculative slices, the predictions must be bound to specific dynamic branch instances. We present a technique that invalidates predictions when it can be determined (by monitoring the programs execution path) that they will not be used. This enables the remaining predictions to be correctly correlated.

IEEE Transactions on Computers | 1996

ARB: a hardware mechanism for dynamic reordering of memory references

Manoj Franklin; Gurindar S. Sohi

To exploit instruction level parallelism, it is important not only to execute multiple memory references per cycle, but also to reorder memory references-especially to execute loads before stores that precede them in the sequential instruction stream. To guarantee correctness of execution in such situations, memory reference addresses have to be disambiguated. This paper presents a novel hardware mechanism, called an Address Resolution Buffer (ARB), for performing dynamic reordering of memory references. The ARB supports the following features: (1) dynamic memory disambiguation in a decentralized manner, (2) multiple memory references per cycle, (3) out-of-order execution of memory references, (4) unresolved loads and stores, (5) speculative loads and stores, and (6) memory renaming. The paper presents the results of a simulation study that we conducted to verify the efficacy of the ARB for a superscalar processor. The paper also shows the ARBs application in a multiscalar processor.

international symposium on computer architecture | 1992

The Expandable Split Window Paradigm for Exploiting Fine-grain Parallelism

Manoj Franklin; Gurindar S. Sohi

We propose a new processing paradigm, called the Expandable Split Window (ESW) paradigm, for exploiting fine-grain parallelism. This paradigm considers a window of instructions (possibly having dependencies) as a single unit, and exploits fine-grain parallelism by overlapping the execution of multiple windows. The basic idea is to connect multiple sequential processors, in a decoupled and decentralized manner, to achieve overall multiple issue. This processing paradigm shares a number of properties of the restricted dataflow machines, but was derived from the sequential von Neumann architecture. We also present an implementation of the Expandable Split Window execution model, and preliminary performance results.

Explore More