Raksit Ashok
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Raksit Ashok.
international symposium on microarchitecture | 2001
Osman S. Unsal; Raksit Ashok; Israel Koren; C. Mani Krishna; Csaba Andras Moritz
We claim that the unique characteristics of multimedia applications dictate media-sensitive architectural and compiler approaches to reduce the power consumption of the data cache. Our motivation is exploring energy savings for real-time multimedia workloads without sacrificing performance. In this paper, we present two complementary media-sensitive energy-saving techniques that leverage static information. While our first technique is applicable to existing architectures, in our second technique we adopt a more radical approach and propose a new caching architecture by re-evaluating the architecture-compiler interface. Our experiments show that substantial energy savings are possible in the data cache. Across a wide range of cache and architectural configurations we obtain up to 77% energy savings, while the performance varies from 14% improvement to 4% degradation depending on the application.
architectural support for programming languages and operating systems | 2002
Raksit Ashok; Saurabh Chheda; Csaba Andras Moritz
This paper presents Cool-Mem, a family of memory system architectures that integrate conventional memory system mechanisms, energy-aware address translation, and compiler-enabled cache disambiguation techniques, to reduce energy consumption in general purpose architectures. It combines statically speculative cache access modes, a dynamic CAM based Tag-Cache used as backup for statically mispredicted accesses, various conventional multi-level associative cache organizations, embedded protection checking along all cache access mechanisms, as well as architectural organizations to reduce the power consumed by address translation in virtual memory. Because it is based on speculative static information, the approach removes the burden of provable correctness in compiler analysis passes that extract static information. This makes Cool-Mem applicable for large and complex applications, without having any limitations due to complexity issues in the compiler passes or the presence of precompiled static libraries. Based on extensive evaluation, for both SPEC2000 and Mediabench applications, 12% to 20% total energy savings are obtained in the processor, with performance ranging from 1.2% degradation to 8% improvement, for the applications studied.
ACM Transactions in Embedded Computing Systems | 2003
Osman Unsal; Raksit Ashok; Israel Koren; C. Mani Krishna; Csaba Andras Moritz
The unique characteristics of multimedia/embedded applications dictate media-sensitive architectural and compiler approaches to reduce the power consumption of the data cache. Our goal is exploring energy savings for embedded/multimedia workloads without sacrificing performance. Here, we present two complementary media-sensitive energy-saving techniques that leverage static information. While our first technique is applicable to existing architectures, in our second technique we adopt a more radical approach and propose a new tagless caching architecture by reevaluating the architecture--compiler interface.Our experiments show that substantial energy savings are possible in the data cache. Across a wide range of cache and architectural configurations, we obtain up to 77% energy savings, while the performance varies from 14% improvement to 4% degradation depending on the application.
symposium on code generation and optimization | 2011
Silvius Rus; Raksit Ashok; David Xinliang Li
String operations such as memcpy, memset and memcmp account for a nontrivial amount of Google datacenter resources. String operations hurt processor cache efficiency when the data accessed is not reused shortly thereafter. Such cache pollution can be avoided by using nontemporal memory access to bypass L2/L3 caches. As reuse distance varies greatly across different memcpy static call contexts in the same program, an efficient solution needs to be call context sensitive. We propose a novel solution to this problem using the page protection mechanism to measure reuse distance and the GCC feedback directed optimization mechanism to generate nontemporal memory access instructions at the appropriate static code contexts. First, the compiler inserts instrumentation for calls to string operations. Then a run time library measures reuse distance using the page protection mechanism during a representative profiling run. The compiler finally generates calls to specialized string operations that use nontemporal operations for the arguments with large reuse distance. We present a full implementation and initial results including speedup on large datacenter applications.
ACM Transactions on Computer Systems | 2004
Raksit Ashok; Saurabh Chheda; Csaba Andras Moritz
This article presents Cool-Mem, a family of memory system architectures that integrate conventional memory system mechanisms, energy-aware address translation, and compiler-enabled cache disambiguation techniques, to reduce energy consumption in general-purpose architectures. The solutions provided in this article leverage on interlayer tradeoffs between architecture, compiler, and operating system layers. Cool-Mem achieves power reduction by statically matching memory operations with energy-efficient cache and virtual memory access mechanisms. It combines statically speculative cache access modes, a dynamic content addressable memory-based (CAM-based) Tag-Cache used as backup for statically mispredicted accesses, different conventional multilevel associative cache organizations, embedded protection checking along all cache access mechanisms, as well as architectural organizations to reduce the power consumed by address translation in virtual memory. Because it is based on speculative static information, a superset of the predictable program information available at compile-time, our approach removes the burden of provable correctness in compiler analysis passes that extract static information. This makes Cool-Mem highly practical, applicable for large and complex applications, without having any limitations due to complexity issues in our compiler passes or the presence of precompiled static libraries. Based on extensive evaluation, for both SPEC2000 and Mediabench applications, we obtain from 6% to 19% total energy savings in the processor, with performance ranging from 1.5% degradation to 6% improvement, for the applications studied. We have also compared Cool-Mem to several prior arts and have found Cool-Mem to perform better in almost all cases.
Journal of Parallel and Distributed Computing | 2008
Yao Guo; Vladimir Vlassov; Raksit Ashok; Richard Weiss; Csaba Andras Moritz
The quest to improve performance forces designers to explore finer-grained multiprocessor machines. Ever increasing chip densities based on CMOS improvements fuel research in highly parallel chip multiprocessors with 100s of processing elements. With such increasing levels of parallelism, synchronization is set to become a major performance bottleneck and efficient support for synchronization an important design criterion. Previous research has shown that integrating support for fine-grained synchronization can have significant performance benefits compared to traditional coarse-grained synchronization. Not much progress has been made in supporting fine-grained synchronization transparently to processor nodes: a key reason perhaps why wide adoption has not followed. In this paper, we propose a novel approach called synchronization coherence that can provide transparent fine-grained synchronization and caching in a multiprocessor machine and single-chip multiprocessor. Our approach merges fine-grained synchronization mechanisms with traditional cache coherence protocols. It reduces network utilization as well as synchronization related processing overheads while adding minimal hardware complexity as compared to cache coherence mechanisms or previously reported fine-grained synchronization techniques. In addition to its benefit of making synchronization transparent to processor nodes, for the applications studied, it provides up to 23% improvement in performance and up to 24% improvement in energy efficiency with no L2 caches compared to previous fine-grained synchronization techniques. The performance improvement increases up to 38% when simulating with an ideal L2 cache system.
Archive | 2004
Saurabh Chheda; Kristopher Carver; Raksit Ashok
Archive | 2002
Csaba Andras Moritz; Mani Krishna; Israel Koren; Osman S. Unsal; Saurabh Chheda; Raksit Ashok
Archive | 2005
Saurabh Chheda; Kristopher Carver; Raksit Ashok
Archive | 2014
Xinliang David Li; Raksit Ashok; Robert Hundt