Is this you? Create Your Porfile

C. M. Krishna

University of Massachusetts Amherst

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where C. M. Krishna is active.

Explore More

Publication

Featured researches published by C. M. Krishna.

international symposium on low power electronics and design | 2002

Towards energy-aware software-based fault tolerance in real-time systems

Osman S. Unsal; Israel Koren; C. M. Krishna

Many real-time systems employed in defense, space, and consumer applications have power constraints and high reliability requirements. In this paper, we focus on the relationship between fault tolerance techniques and energy consumption. In particular, we establish the energy efficiency of Application Level Fault Tolerance (ALFT) over other software-based fault tolerance methods. We then develop sensible energy-aware heuristics for ALFT schemes. The heuristics yield up to 40% energy savings.

Journal of Low Power Electronics | 2007

TILTS: A Fast Architectural-Level Transient Thermal Simulation Method

Yongkui Han; Israel Koren; C. M. Krishna

As power density of microprocessors is increasing rapidly and resulting in high temperatures, the reliability of chips is greatly affected, making thermal simulation a necessity for CPU designs. Current thermal simulation methods (for example, the HotSpot simulator) are very useful, but are still inefficient when performing thermal analysis for long simulation times. In this paper, we propose a novel transient thermal simulation method for CPU chips at the architecture level, TILTS(Time Invariant Linear Thermal System), which utilizes the fact that the input power trace is discretized over a fixed sampling interval to accelerate thermal simulations. TILTSallows us to calculate transient temperatures on a chip over long simulation times. Based on a linear system formulation, TILTShas the same accuracy as that of traditional thermal simulation tools and is orders of magnitude faster than previous algorithms. Compared to the HotSpot simulator, TILTS achieves speedups of 1300 for the processors in our experiments for an appropriate sampling interval of 100 s. With some additional memory space, the improved algorithm CONTILTS(Convolutional TILTS ) is about 6000 times faster than the HotSpot simulator for the processors in our experiments.

IEEE Transactions on Parallel and Distributed Systems | 2011

Utilization-Based Resource Partitioning for Power-Performance Efficiency in SMT Processors

Huaping Wang; Israel Koren; C. M. Krishna

Simultaneous multithreading (SMT) increases processor throughput by allowing parallel execution of several threads. However, fully sharing processor resources may cause resource monopolization by a single thread or other misallocations, resulting in overall performance degradation. Static resource partitioning techniques have been suggested, but are not as effective as dynamic ones since program behavior does change over the course of its execution. In this paper, we propose an Adaptive Resource Partitioning Algorithm (ARPA) that dynamically assigns resources to threads according to changes in thread behavior. ARPA analyzes the resource usage efficiency of each thread in a given time period and assigns more resources to threads which can use them more efficiently. Its purpose is to improve the efficiency of resource utilization, thereby improving overall instruction throughput. Our simulation results on a set of 42 multiprogramming workloads show that ARPA outperforms the traditional fetch policy ICOUNT by 55.8 percent with regard to overall instruction throughput and achieves a 33.8 percent improvement over Static Partitioning. It also outperforms the current best dynamic resource allocation technique, Hill-climbing, by 5.7 percent. Considering fairness accorded to each thread, ARPA attains 43.6, 18.5, and 9.2 percent improvements over ICOUNT, Static Partitioning, and Hill-climbing, respectively, using a common fairness metric. We also explore the energy efficiency of dynamically controlling the number of powered-on reorder buffer entries for ARPA. Compared with ARPA, our energy-aware resource partitioning algorithm achieves 10.6 percent energy savings, while the performance loss is negligible.

dependable systems and networks | 2003

A voltage scheduling heuristic for real-time task graphs

D. Roychowdhury; Israel Koren; C. M. Krishna

Energy constrained complex real-time systems are becoming increasingly important in defense, space, and consumer applications. In this paper, we present a sensible heuristic to address the problem of energy-efficient voltage scheduling of a hard real-time task graph with precedence constraints for a multi-processor environment. We show that consideration of inter-relationships among the tasks in a holisitic way can lead to an effective heuristic for reducing energy expenditure. We developed this algorithm for systems running with two voltage levels since this is currently supported by a majority of modern processors. We then extend the algorithm for processors that can support multiple voltage levels. The results show that substantial energy savings can be achieved by using our scheme. The algorithm is then compared with other relevant algorithms derived for hypothetical systems which can run on infinite voltage levels in a given range. Our two voltage systems, using the task dependencies effectively, can provide a comparable performance with those algorithms in the cases where continuous voltage switching is not allowed.

compilers, architecture, and synthesis for embedded systems | 2001

EDF scheduling using two-mode voltage-clock-scaling for hard real-time systems

Yann Hang Lee; Yoonmee Doh; C. M. Krishna

Scaling down power supply voltage yields a quadratic reduction in dynamic power dissipation and also requires a reduction in clock frequency. In order to meet task deadlines in hard real-time systems, the delay penalty in voltage scaling needs to be carefully considered to achieve low power consumption. In this paper, we focus on dynamic reclaiming of early released resources in Earliest Deadline First (EDF) scheduling using voltage scaling. In addition to a static voltage assignment, we propose a new dynamic-mode assignment, which has a flexible voltage mode setting at run-time enabling much larger energy savings. Using simulation results and exploiting the interplay between power supply voltage, frequency, and circuit delay in CMOS technology, we find the optimal two-level voltage settings that minimize energy consumption.

international symposium on performance analysis of systems and software | 2006

Compiler-based adaptive fetch throttling for energy-efficiency

Huaping Wang; Yao Guo; Israel Koren; C. M. Krishna

Front-end instruction delivery accounts for a significant fraction of energy consumption in dynamically scheduled superscalar processors. Different front-end throttling techniques have been introduced to reduce the chip-wide energy consumption caused by redundant fetching. Hardware-based techniques, such as flow-based throttling, could reduce the energy consumption considerably, but with a high performance loss. On the other hand, compiler-based IPC-estimation-driven software fetch throttling (CFT) techniques result in relatively low performance degradation, which is desirable for high-performance processors. However, their energy savings are limited by the fact that they typically use a predefined fixed low IPC-threshold to control throttling. In this paper, we propose a compiler-based adaptive fetch throttling (CAFT) technique that allows changing the throttling threshold dynamically at runtime. Instead of using a fixed threshold, our technique uses the decode/issue difference (DID) to assist the fetch throttling decision based on the statically estimated IPC. Changing the threshold dynamically makes it possible to throttle at a higher estimated IPC, thus increasing the throttling opportunities and resulting in larger energy savings. We demonstrate that CAFT could increase the energy savings significantly compared to CFT, while preserving its benefit of low performance loss. Our simulation results show that the proposed technique doubles the energy-delay product (EDP) savings compared to the fixed threshold throttling and achieves a 6.1% average EDP saving.

IEEE Transactions on Parallel and Distributed Systems | 2007

Software-Based Failure Detection and Recovery in Programmable Network Interfaces

Yizheng Zhou; Vijay Lakamraju; Israel Koren; C. M. Krishna

Emerging network technologies have complex network interfaces that have renewed concerns about network reliability. In this paper, we present an effective low-overhead fault tolerance technique to recover from network interface failures. Failure detection is based on a software watchdog timer that detects network processor hangs and a self-testing scheme that detects interface failures other than processor hangs. The proposed self-testing scheme achieves failure detection by periodically directing the control flow to go through only active software modules in order to detect errors that affect instructions in the local memory of the network interface. Our failure recovery is achieved by restoring the state of the network interface using a small backup copy containing just the right amount of information required for complete recovery. The paper shows how this technique can be made to minimize the performance impact to the host system and be completely transparent to the user.

embedded and real-time computing systems and applications | 2003

Constrained energy allocation for mixed hard and soft real-time tasks

Yoonmee Doh; Daeyoung Kim; Yann Hang Lee; C. M. Krishna

Voltage-Clock Scaling (VCS) is an effective approach to reducing total energy consumption in low power microprocessor systems. To provide real-time guarantees, the delay penalty in VCS needs to be carefully considered in real-time scheduling. In addition to real-time requirements, the systems may contain non-real-time tasks whose response time should be minimized. Thus, a combination of optimization objectives should be addressed when we establish a scheduling policy under a power consumption constraint. In this paper, we propose a VCS approach which leads to proper allocations of energy budgets for mixed hard and soft real-time tasks. Based on the schedulability of VCS-EDF, we investigate the characteristics of energy demand of hard periodic and soft aperiodic tasks. Using simulation and subject to a given energy budget, proper voltage settings can be chosen to attain an improved performance for aperiodic tasks while meeting the deadline requirements of periodic tasks.

IEEE Transactions on Parallel and Distributed Systems | 2002

Filtering random graphs to synthesize interconnection networks with multiple objectives

Vijay Lakamraju; Israel Koren; C. M. Krishna

Synthesizing networks that satisfy multiple requirements, such as high reliability, low diameter, good embeddability, etc., is a difficult problem to which there has been no completely satisfactory solution. We present a simple, yet very effective, approach to this problem. The crux of our approach is a filtration process that takes as input a large set of randomly generated graphs and filters out those that do not meet the specified requirements. Our experimental results show that this approach is both practical and powerful. The use of random regular networks as the raw material for the filtration process was motivated by their surprisingly good performance with regard to almost all properties that characterize a good interconnection network. We provide results related to the generation of networks that have low diameter, high fault tolerance, and good embeddability. Through this, we show that the generated networks are serious competitors to several traditional well-known networks. We also explore how random networks can be used in a packaging hierarchy and comment on the scope of application of these networks.

Journal of Parallel and Distributed Computing | 1991

A random, distributed algorithm to embed trees in partially faulty processor arrays

Dipak Sitaram; Israel Koren; C. M. Krishna

Abstract A random and distributed, yet simple, algorithm is presented to embed trees in rectangular processor arrays in the presence of faults and manufacturing defects. One major characteristic of the algorithm is its ability to obtain embeddings in the presence of defect clusters in the processor array. Another useful feature is that the algorithm can be executed on the processor array itself, requiring only limited interprocessor communication. The performance of random algorithms is very difficult to analyze. Therefore, extensive simulation experiments were conducted and are presented in this paper. We also suggest an approach to overlapping algorithm runs to reduce the time needed to obtain a good embedding. The proposed algorithm can easily be extended to embed trees in hexagonal as well as other types of processor arrays.

Explore More