Arun Raghavan
University of Pennsylvania
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Arun Raghavan.
high performance computer architecture | 2012
Arun Raghavan; Yixin Luo; Anuj Chandawalla; Marios C. Papaefthymiou; Kevin P. Pipe; Thomas F. Wenisch; Milo M. K. Martin
Although transistor density continues to increase, voltage scaling has stalled and thus power density is increasing each technology generation. Particularly in mobile devices, which have limited cooling options, these trends lead to a utilization wall in which sustained chip performance is limited primarily by power rather than area. However, many mobile applications do not demand sustained performance; rather they comprise short bursts of computation in response to sporadic user activity. To improve responsiveness for such applications, this paper explores activating otherwise powered-down cores for sub-second bursts of intense parallel computation. The approach exploits the concept of computational sprinting, in which a chip temporarily exceeds its sustainable thermal power budget to provide instantaneous throughput, after which the chip must return to nominal operation to cool down. To demonstrate the feasibility of this approach, we analyze the thermal and electrical characteristics of a smart-phone-like system that nominally operates a single core (~1W peak), but can sprint with up to 16 cores for hundreds of milliseconds. We describe a thermal design that incorporates phase-change materials to provide thermal capacitance to enable such sprints. We analyze image recognition kernels to show that parallel sprinting has the potential to achieve the task response time of a 16W chip within the thermal constraints of a 1W mobile platform.
programming language design and implementation | 2013
Abhishek Udupa; Arun Raghavan; Jyotirmoy V. Deshmukh; Sela Mador-Haim; Milo M. K. Martin; Rajeev Alur
With the maturing of technology for model checking and constraint solving, there is an emerging opportunity to develop programming tools that can transform the way systems are specified. In this paper, we propose a new way to program distributed protocols using concolic snippets. Concolic snippets are sample execution fragments that contain both concrete and symbolic values. The proposed approach allows the programmer to describe the desired system partially using the traditional model of communicating extended finite-state-machines (EFSM), along with high-level invariants and concrete execution fragments. Our synthesis engine completes an EFSM skeleton by inferring guards and updates from the given fragments which is then automatically analyzed using a model checker with respect to the desired invariants. The counterexamples produced by the model checker can then be used by the programmer to add new concrete execution fragments that describe the correct behavior in the specific scenario corresponding to the counterexample. We describe TRANSIT, a language and prototype implementation of the proposed specification methodology for distributed protocols. Experimental evaluations of TRANSIT to specify cache coherence protocols show that (1) the algorithm for expression inference from concolic snippets can synthesize expressions of size 15 involving typical operators over commonly occurring types, (2) for a classical directory-based protocol, TRANSIT automatically generates, in a few seconds, a complete implementation from a specification consisting of the EFSM structure and a few concrete examples for every transition, and (3) a published partial description of the SGI Origin cache coherence protocol maps directly to symbolic examples and leads to a complete implementation in a few iterations, with the programmer correcting counterexamples resulting from underspecified transitions by adding concrete examples in each iteration.
international symposium on microarchitecture | 2008
Arun Raghavan; Colin Blundell; Milo M. K. Martin
Traditional coherence protocols present a set of difficult tradeoffs: the reliance of snoopy protocols on broadcast and ordered interconnects limits their scalability, while directory protocols incur a performance penalty on sharing misses due to indirection. This work introduces PATCH (Predictive/Adaptive Token Counting Hybrid), a coherence protocol that provides the scalability of directory protocols while opportunistically sending direct requests to reduce sharing latency. PATCH extends a standard directory protocol to track tokens and use token counting rules for enforcing coherence permissions. Token counting allows PATCH to support direct requests on an unordered interconnect, while a mechanism called token tenure uses local processor timeouts and the directorypsilas per-block point of ordering at the home node to guarantee forward progress without relying on broadcast. PATCH makes three main contributions. First, PATCH introduces token tenure, which provides broadcast-free forward progress for token counting protocols. Second, PATCH deprioritizes best-effort direct requests to match or exceed the performance of directory protocols without restricting scalability. Finally, PATCH provides greater scalability than directory protocols when using inexact encodings of sharers because only processors holding tokens need to acknowledge requests. Overall, PATCH is a ldquoone-size-fits-allrdquo coherence protocol that dynamically adapts to work well for small systems, large systems, and anywhere in between.
international symposium on computer architecture | 2010
Colin Blundell; Arun Raghavan; Milo M. K. Martin
Over the past decade there has been a surge of academic and industrial interest in optimistic concurrency, i.e. the speculative parallel execution of code regions that have the semantics of isolation. This work analyzes scalability bottlenecks of workloads that use optimistic concurrency. We find that one common bottleneck is updates to auxiliary program data in otherwise non-conflicting operations, e.g. reference count updates and hashtable occupancy field increments. To eliminate the performance impact of conflicts on such auxiliary data, this work proposes RETCON, a hardware mechanism that tracks the relationship between input and output values symbolically and uses this symbolic information to transparently repair the output state of a transaction at commit. RETCON is inspired by instruction replay-based mechanisms but exploits simplifying properties of the nature of computations on auxiliary data to perform repair without replay. Our experiments show that RETCON provides significant speedups for workloads that exhibit conflicts on auxiliary data, including transforming a transactionalized version of the Python interpreter from a workload that exhibits no scaling to one that exhibits near-linear scaling on 32 cores.
IEEE Micro | 2013
Arun Raghavan; Laurel Emurian; Lei Shao; Marios C. Papaefthymiou; Kevin P. Pipe; Thomas F. Wenisch; Milo M. K. Martin
Computational sprinting activates dark silicon to improve responsiveness by briefly but intensely exceeding a systems sustainable power limit. This article focuses on the energy implications of sprinting. The authors observe that sprinting can save energy even while improving responsiveness by enabling execution in chip configurations that, though thermally unsustainable, improve energy efficiency. Surprisingly, this energy savings can translate to throughput improvements even for long-running computations. Repeatedly alternating between sprint and idle modes while maintaining sustainable average power can outperform steady-state computation at the platforms thermal limit.
semiconductor thermal measurement and management symposium | 2014
Lei Shao; Arun Raghavan; Laurel Emurian; Marios C. Papaefthymiou; Thomas F. Wenisch; Milo M. K. Martin; Kevin P. Pipe
Computational sprinting has been proposed to improve responsiveness for the intermittent computational demands of many current and emerging mobile applications by briefly activating reserve cores and/or boosting frequency and voltage to power levels that far exceed the systems sustained cooling capability. In this work, we focus on the thermal consequences of computational sprinting, studying the use of silicon thermal test chips as processor proxies in a real smartphone package with realistic thermal constraints. We study conditions in which multiple cycles of sprint and cooldown are repeated every few seconds to verify the feasibility of sprinting. Integrated on-chip phase change heat sinks filled with low melting temperature metallic alloys are demonstrated to provide a thermal buffer during intermittent computations by keeping the chip at lower peak and average temperatures.
IEEE Micro | 2013
Arun Raghavan; Yixin Luo; Anuj Chandawalla; Marios C. Papaefthymiou; Kevin P. Pipe; Thomas F. Wenisch; Milo M. K. Martin
The tight thermal constraints of mobile devices, which limit sustainable performance, and the bursty nature of interactive mobile applications call for a new design focus: enhancing user responsiveness rather than sustained throughput. To that end, this article explores computational sprinting, wherein a mobile device temporarily exceeds sustainable thermal limits to provide a brief, intense burst of computation in response to user input. By enabling tenfold more computation within the timescale of human patience, sprinting has the potential to fundamentally change the user experience that an interactive application can provide.
ACM Transactions on Architecture and Code Optimization | 2010
Arun Raghavan; Colin Blundell; Milo M. K. Martin
Traditional coherence protocols present a set of difficult trade-offs: the reliance of snoopy protocols on broadcast and ordered interconnects limits their scalability, while directory protocols incur a performance penalty on sharing misses due to indirection. This work introduces Patch (Predictive/Adaptive Token-Counting Hybrid), a coherence protocol that provides the scalability of directory protocols while opportunistically sending direct requests to reduce sharing latency. Patch extends a standard directory protocol to track tokens and use token-counting rules for enforcing coherence permissions. Token counting allows Patch to support direct requests on an unordered interconnect, while a mechanism called token tenure provides broadcast-free forward progress using the directory protocols per-block point of ordering at the home along with either timeouts at requesters or explicit race notification messages. Patch makes three main contributions. First, Patch introduces token tenure, which provides broadcast-free forward progress for token-counting protocols. Second, Patch deprioritizes best-effort direct requests to match or exceed the performance of directory protocols without restricting scalability. Finally, Patch provides greater scalability than directory protocols when using inexact encodings of sharers because only processors holding tokens need to acknowledge requests. Overall, Patch is a “one-size-fits-all” coherence protocol that dynamically adapts to work well for small systems, large systems, and anywhere in between.
architectural support for programming languages and operating systems | 2013
Arun Raghavan; Laurel Emurian; Lei Shao; Marios C. Papaefthymiou; Kevin P. Pipe; Thomas F. Wenisch; Milo M. K. Martin
Archive | 2012
Thomas F. Wenisch; Kevin Pipe; Marios Papaefthymiou; Milo M. K. Martin; Arun Raghavan