Tejas Karkhanis | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Tejas Karkhanis is active.

Explore More

Publication

Featured researches published by Tejas Karkhanis.

international symposium on computer architecture | 2004

A First-Order Superscalar Processor Model

Tejas Karkhanis; James E. Smith

A proposed performance model for superscalar processors consists of: 1) a component that models the relationship between instructions issued per cycle and the size of the instruction window under ideal conditions; and 2) methods for calculating transient performance penalties due to branch mispredictions, instruction cache misses, and data cache misses. Using trace-derived data dependence information, data and instruction cache miss rates, and branch miss-prediction rates as inputs, the model can arrive at performance estimates for a typical superscalar processor that are within 5.8% of detailed simulation on average and within 13% in the worst case. The model also provides insights into the workings of superscalar processors and long-term microarchitecture trends such as pipeline depths and issue widths.

international symposium on computer architecture | 2003

Energy efficient co-adaptive instruction fetch and issue

Alper Buyuktosunoglu; Tejas Karkhanis; David H. Albonesi; Pradip Bose

Front-end instruction delivery accounts for a significant fraction of the energy consumed in a dynamic superscalar processor. The issue queue in these processors serves two crucial roles: it bridges the front and back ends of the processor and serves as the window of instructions for the out-of-order engine. A mismatch between the front end producer rate and back end consumer rate, and between the supplied instruction window from the front end, and the required instruction window to exploit the level of application parallelism, results in additional front-end energy, and increases the issue queue utilization. While the former increases overall processor energy consumption, the latter aggravates the issue queue hot spot problem.We propose a complementary combination of fetch gating and issue queue adaptation to address both of these issues. We introduce an issue-centric fetch gating scheme based on issue queue utilization and application parallelism characteristics. Our scheme attempts to provide an instruction window size that matches the current parallelism characteristics of the application while maintaining enough queue entries to avoid back-end starvation. Compared to a conventional fetch gating scheme based on flow-rate matching, we demonstrate 20% better overall energy-delay with a 44% additional reduction in issue queue energy. We identify Icache energy savings as the largest contributor to the overall savings and quantify the sources of savings in this structure. We then couple this issue-driven fetch gating approach with an issue queue adaptation scheme based on queue utilization. While the fetch gating scheme provides a window of issue queue instructions appropriate to the level of program parallelism, the issue queue adaptation approach shuts down the remaining underutilized issue queue entries. Used in tandem, these complementary techniques yield a 20% greater issue queue energy savings than the addition of the savings from each technique applied in isolation. The result of this combined approach is a 6% overall energy-delay savings coupled with a 54% reduction in issue queue energy.

Ibm Journal of Research and Development | 2015

Active Memory Cube: A processing-in-memory architecture for exascale systems

Ravi Nair; Samuel F. Antao; Carlo Bertolli; Pradip Bose; José R. Brunheroto; Tong Chen; Chen-Yong Cher; Carlos H. Andrade Costa; J. Doi; Constantinos Evangelinos; Bruce M. Fleischer; Thomas W. Fox; Diego S. Gallo; Leopold Grinberg; John A. Gunnels; Arpith C. Jacob; P. Jacob; Hans M. Jacobson; Tejas Karkhanis; Choon Young Kim; Jaime H. Moreno; John Kevin Patrick O'Brien; Martin Ohmacht; Yoonho Park; Daniel A. Prener; Bryan S. Rosenburg; Kyung Dong Ryu; Olivier Sallenave; Mauricio J. Serrano; Patrick Siegl

Many studies point to the difficulty of scaling existing computer architectures to meet the needs of an exascale system (i.e., capable of executing

international symposium on computer architecture | 2007

Automated design of application specific superscalar processors: an analytical approach

Tejas Karkhanis; James E. Smith

10^{18}

international symposium on low power electronics and design | 2002

Saving energy with just in time instruction delivery

Tejas Karkhanis; James E. Smith; Pradip Bose

floating-point operations per second), consuming no more than 20 MW in power, by around the year 2020. This paper outlines a new architecture, the Active Memory Cube, which reduces the energy of computation significantly by performing computation in the memory module, rather than moving data through large memory hierarchies to the processor core. The architecture leverages a commercially demonstrated 3D memory stack called the Hybrid Memory Cube, placing sophisticated computational elements on the logic layer below its stack of dynamic random-access memory (DRAM) dies. The paper also describes an Active Memory Cube tuned to the requirements of a scientific exascale system. The computational elements have a vector architecture and are capable of performing a comprehensive set of floating-point and integer instructions, predicated operations, and gather-scatter accesses across memory in the Cube. The paper outlines the software infrastructure used to develop applications and to evaluate the architecture, and describes results of experiments on application kernels, along with performance and power projections.

PACS'02 Proceedings of the 2nd international conference on Power-aware computer systems | 2002

Early-stage definition of LPX: a low power issue-execute processor

Pradip Bose; David M. Brooks; Alper Buyuktosunoglu; Peter W. Cook; Koushik K. Das; Philip G. Emma; Michael Karl Gschwind; Hans M. Jacobson; Tejas Karkhanis; Prabhakar Kudva; Stanley E. Schuster; James E. Smith; Viji Srinivasan; Victor Zyuban; David H. Albonesi; Sandhya Dwarkadas

Analytical modeling is applied to the automated design of application-specific superscalar processors. Using an analytical method bridges the gap between the size of the design space and the time required for detailed cycle-accurate simulations. The proposed design framework takes as inputs the design targets (upper bounds on execution time, area, and energy), design alternatives, and one or more application programs. The output is the set of out-of-order superscalar processors that are Pareto-optimal with respect to performance-energy-area. The core of the new design framework is made up of analytical performance and energy activity models, and an analytical model-based design optimization process. For a set of benchmark programs and a design space of 2000 designs, the design framework arrives at all performance-energy-area Pareto-optimal design points within 16 minutes on a 2 GHz Pentium-4. In contrast, it is estimated that a naíve cycle-accurate simulation-based exhaustive search would require at least two months to arrive at the Pareto-optimal design points for the same design space.

architectural support for programming languages and operating systems | 2006

A performance counter architecture for computing accurate CPI components

Stijn Eyerman; Lieven Eeckhout; Tejas Karkhanis; James E. Smith

Just-In-Time instruction delivery is a general method for saving energy in a microprocessor by dynamically limiting the number of in-flight instructions. The goal is to save energy by 1) fetching valid instructions no sooner than necessary, avoiding cycles stalled in the pipeline -- especially the issue queue, and 2) reducing the number of fetches and subsequent processing of mis-speculated instructions. A simple algorithm monitors performance and adjusts the maximum number of in-flight instructions at fairly long intervals, 100K instructions in this study. The proposed JIT instruction delivery scheme provides the combined benefits of more targeted schemes proposed previously. With only a 3% performance degradation, energy savings in the fetch, decode pipe, and issue queue are 10%, 12%, and 40%, respectively.

WMPI | 2002

A Day in the Life of a Data Cache Miss

Tejas Karkhanis; James E. Smith

We present the high-level microarchitecture of LPX: a low-power issue-execute processor prototype that is being designed by a joint industry-academia research team. LPX implements a very small subset of a RISC architecture, with a primary focus on a vector (SIMD) multimedia extension. The objective of this project is to validate some key new ideas in power-aware microarchitecture techniques, supported by recent advances in circuit design and clocking.

Archive | 2006

Automated design of application-specific superscalar processors

James E. Smith; Tejas Karkhanis

Lecture Notes in Computer Science | 2003

Early-stage definition of LPX: A low power issue-execute processor

Prosenjit Bose; David M. Brooks; Alper Buyuktosunoglu; Peter W. Cook; Koushik K. Das; Philip G. Emma; Michael Karl Gschwind; Hans M. Jacobson; Tejas Karkhanis; Prabhakar Kudva; Stanley E. Schuster; James E. Smith; Viji Srinivasan; Victor Zyuban; David H. Albonesi; Sandhya Dwarkadas

Explore More