Srilatha Manne | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Srilatha Manne is active.

Explore More

Publication

Featured researches published by Srilatha Manne.

international symposium on computer architecture | 1998

Pipeline gating: speculation control for energy reduction

Srilatha Manne; Artur Klauser; Dirk Grunwald

Branch prediction has enabled microprocessors to increase instruction level parallelism (ILP) by allowing programs to speculatively execute beyond control boundaries. Although speculative execution is essential for increasing the instructions per cycle (IPC), it does come at a cost. A large amount of unnecessary work results from wrong-path instructions entering the pipeline due to branch misprediction. Results generated with the SimpleScalar tool set using a 4-way issue pipeline and various branch predictors show an instruction overhead of 16% to 105% for event instruction committed. The instruction overhead will increase in the future as processors use more aggressive speculation and wider issue widths. In this paper we present an innovative method for power reduction ,which, unlike previous work that sacrificed flexibility or performance reduces power in high-performance microprocessors without impacting performance. In particular we introduce a hardware mechanism called pipeline gating to control rampant speculation in the pipeline. We present inexpensive mechanisms for determining when a branch is likely to mispredict, and for stopping wrong-path instructions from entering the pipeline. Results show up to a 38% reduction in wrong-path instructions with a negligible performance loss (/spl ap/1%). Best of all, even in programs with a high branch prediction accuracy, performance does not noticeable degrade. Our analysis indicates that there is little risk in implementing this method in existing processors since it does not impact performance and can benefit energy reduction.

international symposium on computer architecture | 2001

Power and energy reduction via pipeline balancing

R. Iris Bahar; Srilatha Manne

Minimizing power dissipation is an important design requirement for both portable and non-portable systems. In this work, we propose an architectural solution to the power problem that retains performance while reducing power. The technique, known as Pipeline Balancing (PLB), dynamically tunes the resources of a general purpose processor to the needs of the program by monitoring performance within each program. We analyze metrics for triggering PLB, and detail instruction queue design and energy savings based on an extension of the Alpha 21264 processor. Using a detailed simulator, we present component and full chip power and energy savings for single and multi-threaded execution. Results show an issue queue and execution unit power reduction of up to 23% and 13%, respectively, with an average performance loss of 1% to 2%.

international symposium on computer architecture | 1998

Confidence estimation for speculation control

Dirk Grunwald; Artur Klauser; Srilatha Manne; Andrew R. Pleszkun

Modern processors improve instruction level parallelism by speculation. The outcome of data and control decisions is predicted, and the operations are speculatively executed and only committed if the original predictions were correct. There are a number of other ways that processor resources could be used, such as threading or eager execution. As the use of speculation increases, we believe more processors will need some form of speculation control to balance the benefits of speculation against other possible activities.Confidence estimation is one technique that can be exploited by architects for speculation control. In this paper, we introduce performance metrics to compare confidence estimation mechanisms, and argue that these metrics are appropriate for speculation control. We compare a number of confidence estimation mechanisms, focusing on mechanisms that have a small implementation cost and gain benefit by exploiting characteristics of branch predictors, such as clustering of mispredicted branches.We compare the performance of the different confidence estimation methods using detailed pipeline simulations. Using these simulations, we show how to improve some confidence estimators, providing better insight for future investigations comparing and applying confidence estimators.

high-performance computer architecture | 2002

Loose loops sink chips

Eric Borch; Eric Tune; Srilatha Manne; Joel S. Emer

This paper explores the concept of micro-architectural loops and discusses their impact on processor pipelines. In particular, we establish the relationship between loose loops and pipeline length and configuration, and show their impact on performance. We then evaluate the load resolution loop in detail and propose the distributed register algorithm (DRA) as a way of reducing this loop. It decreases the performance loss due to load mis-speculations by reducing the issue-to-execute latency in the pipeline. A new loose loop is introduced into the pipeline by the DRA, but the frequency of mis-speculations is very low. The reduction in latency from issue to execute, along with a low mis-speculation rate in the DRA result in up to a 4% to 15% improvement in performance using a detailed architectural simulator.

international symposium on low power electronics and design | 1998

Power and performance tradeoffs using various caching strategies

R. Iris Bahar; Gianluca Albera; Srilatha Manne

In this paper, we propose several different data and instruction cache configurations and analyze their power as well as performance implications on the processor. Unlike most existing work in low power microprocessor design, we explore a high performance processor with the latest innovations for performance. Using a detailed, architectural-level simulator, we evaluate full system performance using several different power/performance sensitive cache configurations such as increasing cache size or associatively and including buffers along side L1 caches. We then use the information obtained from the simulator to calculate the energy consumption of the memory hierarchy of the system. As an alternative to simply increasing cache associatively or size to reduce lower-level memory energy consumption (which may have a detrimental effect on on-chip energy consumption), we show that, by using buffers, energy consumption of the memory subsystem may be reduced by as much as 13% for certain data cache configurations and by as much as 23% for certain instruction cache configurations without adversely effecting processor performance or on-chip energy consumption.

design automation conference | 1996

New algorithms for gate sizing: a comparative study

Olivier Coudert; Ramsey W. Haddad; Srilatha Manne

Gate sizing consists of choosing for each node of a mapped network a gate implementation in the library so that some cost function is optimized under some constraints. It has a significant impact on the delay, power dissipation, and area, of the final circuit. This paper compares five gate sizing algorithms targeting discrete, non-linear, non-unimodal, constrained optimization. The goal is to overcome the non-linearity and non-unimodality of the delay and the power to achieve good quality results within a reasonable CPU time, e.g., handling a 10000 node network in two hours. We compare the five algorithms on constraint free delay optimization and delay constrained power optimization, and show that one method is superior to the others.

design automation conference | 1995

Computing the Maximum Power Cycles of a Sequential Circuit

Srilatha Manne; Abelardo Pardo; R. Iris Bahar; Gary D. Hachtel; Fabio Somenzi; Enrico Macii; Massimo Poncino

This paper studies the problem of estimating worst case power dissipation in a sequential circuit. We approach this problem by finding the maximum average weight cycles in a weighted directed graph. In order to handle practical sized examples, we use symbolic methods, based on Algebraic Decision Diagrams (ADDs), for computing the maximum average length cycles as well as the number of gate transitions in the circuit, which is necessary to construct the weighted directed graph.

International Journal of Parallel Programming | 2001

Selective branch inversion: confidence estimation for branch predictors

Artur Klauser; Srilatha Manne; Dirk Grunwald

This paper describes a family of branch predictors that use confidence estimation to improve the performance of an underlying branch predictor. This method, referred to as Selective Branch Inversion (SBI), uses a confidence estimator to determine when the branch direction prediction is likely to be incorrect; branch decisions for these low-confidence branches are inverted. SBI with an underlying Gshare branch predictor outperforms other equal sized predictors such as the best history length Gshare predictor, as well as equally complex McFarling and Bi-Mode predictors. Our analysis shows that SBI achieves its performance through conflict detection and correction, rather than through conflict avoidance as some of the previously proposed predictors such as Bi-Mode and Agree. We also show that SBI is applicable to other underlying predictors, such as the McFarling Combined predictor. Finally we show that Dynamic Inversion Monitoring (DIM) can be used as a safeguard to turn off SBI in cases where it degrades the overall performance.

design automation conference | 1997

Remembrance of things past: locality and memory in BDDs

Srilatha Manne; Dirk Grunwald; Fabio Somenzi

Binary Decision Diagrams (BDDs) are efficient at manipulating large sets in a compact manner. BDDs, however, are inefficient at utilizing the memory hierarchy ofthe computer. Recent work addresses this problem by manipulating the BDDsin breath-first manner (BFS). BFS processing is quite successful at reducing the number of page faults when the BDDs do not fit in the available physical memory. When pagingdoes not take place, it is much less clear which paradigmleads to the better performance. In this paper, we perform adetailed analysis of BFS and DFS packages using simulationand direct performance monitoring ofthe memory hierarchy.We show that there is very little difference in TLB and cachemiss rates for DFS and BFS paradigms. We also show thatdifferences in execution time between carefully tuned BFSand DFS implementations are primarily a function of thelossless computed table used in BFS implementations, andnot a function of memory locality. Furthermore, we presentimplementation changes to the the Cudd package that canimprove execution times by asmuch as 26% when the problem fits in main memory, and a factor of six when paging is involved.

international symposium on low power electronics and design | 1995

CMOS dynamic power estimation based on collapsible current source transistor modeling

Abelardo Pardo; R. Iris Bahar; Srilatha Manne; Peter Feldmann; Gary D. Hachtel; Fabio Somenzi

When estimating the dynamic power dissipated by a circuit di erent methods ranging from numeric analog simulation to event-driven logic simulation have been proposed. However, as the technology reaches the deep sub-micron range, additional e ects as the short circuit current, partial voltage swings, transient behavior between clock cycles and leak current are becoming more relevant to achieve an accurate estimation of the consumption. In this paper we present Meiga, an event-driven simulation-based power estimator that accounts for power dissipation due to short circuit current, partial swings and transient behavior. Experiments have shown that circuits in the order of several thousands of transistors are simulated in seconds of CPU per input vector.

Explore More