C. Mani Krishna
University of Massachusetts Amherst
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by C. Mani Krishna.
high-performance computer architecture | 2002
Osman S. Unsal; Israel Koren; C. Mani Krishna; Csaba Andras Moritz
This work is based on our philosophy of providing interlayer system-level power awareness in computing systems. Here, we couple this approach with our vision of multi-partitioned memory systems, where memory accesses are separated based on their static predictability and memory footprint and managed with various compiler controlled techniques. We show that media applications are mapped more efficiently when scalar memory accesses are redirected to a mini-cache. Our results indicate that a partitioned 8K cache with the scalars being mapped to a 512 byte mini-cache can be more efficient than a 16K monolithic cache from both performance and energy point of view for most applications. In extensive experiments, we report 30% to 60% energy-delay product savings over a range of system configurations and different cache sizes.
international symposium on microarchitecture | 2001
Osman S. Unsal; Raksit Ashok; Israel Koren; C. Mani Krishna; Csaba Andras Moritz
We claim that the unique characteristics of multimedia applications dictate media-sensitive architectural and compiler approaches to reduce the power consumption of the data cache. Our motivation is exploring energy savings for real-time multimedia workloads without sacrificing performance. In this paper, we present two complementary media-sensitive energy-saving techniques that leverage static information. While our first technique is applicable to existing architectures, in our second technique we adopt a more radical approach and propose a new caching architecture by re-evaluating the architecture-compiler interface. Our experiments show that substantial energy savings are possible in the data cache. Across a wide range of cache and architectural configurations we obtain up to 77% energy savings, while the performance varies from 14% improvement to 4% degradation depending on the application.
international conference on parallel architectures and compilation techniques | 2008
Huaping Wang; Israel Koren; C. Mani Krishna
Simultaneous Multithreading (SMT) increases processor throughput by allowing the parallel execution of several threads. However, fully sharing processor resources may cause resource monopolization by a single thread or other misallocations, resulting in overall performance degradation. Static resource partitioning techniques have been suggested, but are not as effective as dynamically controlling the resource usage of each thread since program behavior does change during its execution. In this paper, we propose an Adaptive Resource Partitioning Algorithm (ARPA) that dynamically assigns resources to threads according to thread behavior changes. ARPA analyzes the resource usage efficiency of each thread in a time period and assigns more resources to threads which can use them in a more efficient way. The purpose of ARPA is to improve the efficiency of resource utilization, thereby improving overall instruction throughput. Our simulation results on a set of 42 multiprogramming workloads show that ARPA outperforms the traditional fetch policy ICOUNT by 55.8% with regard to overall instruction throughput and achieves a 33.8% improvement over Static Partitioning. It also outperforms the current best dynamic resource allocation technique, Hill-climbing, by 5.7%. Considering fairness accorded to each thread, ARPA attains 43.6%, 18.5% and 9.2% improvements over ICOUNT, Static Partitioning and Hill-climbing, respectively, using a common fairness metric.
The Journal of Supercomputing | 2000
Joshua Haines; Vijay Lakamraju; Israel Koren; C. Mani Krishna
As multiprocessor systems become more complex, their reliability will need to increase as well. In this paper we propose a novel technique which is applicable to a wide variety of distributed real-time systems, especially those exhibiting data parallelism. System-level fault tolerance involves reliability techniques incorporated within the system hardware and software whereas application-level fault tolerance involves reliability techniques incorporated within the application software. We assert that, for high reliability, a combination of system-level fault tolerance and application-level fault tolerance works best. In many systems, application-level fault tolerance can be used to bridge the gap when system-level fault tolerance alone does not provide the required reliability. We exemplify this with the RTHT target tracking benchmark and the ABF beamforming benchmark.
international conference on computer design | 2004
Yao Guo; Saurabh Chheda; Israel Koren; C. Mani Krishna; Csaba Andras Moritz
This paper evaluates several hardware-based data prefetching techniques from an energy perspective, and explores their energy/performance tradeoffs. We present detailed simulation results and make performance and energy comparisons between different configurations. Power characterization is provided based on HSpice circuit-level simulation of state-of-the-art low-power cache designs implemented in deep-submicron process technology. This is combined with architecture-level simulation of switching activities in the memory system. The results show that while aggressive prefetching techniques often help to improve performance, they increase energy consumption in most of the cases. In designs implemented in deep-submicron 100-nm BPTM process technology, cache leakage becomes one of the dominant factors of the energy consumption. We have, however, found that if leakage is optimized with recently-proposed circuit-level techniques, most of the energy degradation is due to prefetch-hardware related costs and unnecessary L1 data cache lookups related to prefetches that hit in the L1 cache. This overhead on the memory system can be as much as 20%.
ACM Transactions in Embedded Computing Systems | 2003
Osman Unsal; Raksit Ashok; Israel Koren; C. Mani Krishna; Csaba Andras Moritz
The unique characteristics of multimedia/embedded applications dictate media-sensitive architectural and compiler approaches to reduce the power consumption of the data cache. Our goal is exploring energy savings for embedded/multimedia workloads without sacrificing performance. Here, we present two complementary media-sensitive energy-saving techniques that leverage static information. While our first technique is applicable to existing architectures, in our second technique we adopt a more radical approach and propose a new tagless caching architecture by reevaluating the architecture--compiler interface.Our experiments show that substantial energy savings are possible in the data cache. Across a wide range of cache and architectural configurations, we obtain up to 77% energy savings, while the performance varies from 14% improvement to 4% degradation depending on the application.
Lecture Notes in Computer Science | 2004
Yao Guo; Saurabh Chheda; Israel Koren; C. Mani Krishna; Csaba Andras Moritz
There has been intensive research on data prefetching focusing on performance improvement, however, the energy aspect of prefetching is relatively unknown. Our experiments show that although software prefetching tends to be more energy efficient, hardware prefetching outperforms software prefetching on most of the applications in terms of performance. This paper proposes several techniques to make hardware-based data prefetching power-aware. Our proposed techniques include three compiler-based approaches which make the prefetch predictor more power efficient. The compiler identifies the pattern of memory accesses in order to selectively apply different prefetching schemes depending on predicted access patterns and to filter out unnecessary prefetches. We also propose a hardware-based filtering technique to further reduce the energy overhead due to prefetching in the L1 cache. Our experiments show that the proposed techniques reduce the prefetching-related energy overhead by close to 40% without reducing its performance benefits.
international parallel and distributed processing symposium | 2000
Osman Unsal; Israel Koren; C. Mani Krishna
In this paper, we study the problem of positioning copies of shared data structures to reduce pow er consumption in real-time systems. Power-constrained real-time systems are of increasing importance in defense, space, and consumer applications. We describe our energy consumption model and present numerical results linking the placement of data structures to energy consumption.
international symposium on nanoscale architectures | 2011
Priyamvada Vijayakumar; Pritish Narayanan; Israel Koren; C. Mani Krishna; Csaba Andras Moritz
Reliable and scalable manufacturing of nanofabrics entails significant challenges. Scalable nanomanufacturing approaches that employ the use of lithographic masks in conjunction with nanofabrication based on self-assembly have been proposed. A bottom-up fabrication of nanoelectronic circuits is expected to be subject to various defects and identifying the types of defects that may occur during each step of a manufacturing pathway is essential in any attempt to achieve reliable manufacturing. The paper proposes a methodology for analyzing the sources of defects in a nano-manufacturing flow and estimating the resulting systematic yield loss. This methodology allows analyzing the impact of the fabrication process on the systematic yield. It integrates physical fabric considerations, manufacturing sequences and the resulting defect scenarios. This is in contrast to most current approaches that use conventional defect models and assume constant defect rates without analyzing the manufacturing pathway to determine the sources of defects and their probabilities (or rates). While the focus of the paper is on estimating the mask overlay-limited yield for the NASIC nano-fabric, the proposed approach can be easily adapted to suit other structured nano-fabrics.
IEEE Transactions on Reliability | 2015
C. Mani Krishna
Cyber-physical systems have become more prevalent in recent years. Computer control is increasingly being considered for use in applications which are operationally resource-constrained and are very cost-sensitive, while requiring very high reliability. The primary approach to building ultra-reliable systems is to deploy massive redundancy. However, the constraints just mentioned make it very difficult to use the massive everywhere-redundancy approaches used in such traditional (relatively cost-insensitive) applications as aerospace. In this paper, we address such problems by adaptively adjusting fault-tolerance levels according to the current state of the controlled plant. Such an approach imposes far less thermal stress on the computer, thereby enhancing reliability, and thus requiring a smaller number of line-replaceable units to maintain required levels of reliability over any given period of operation.