Parthasarathy Ranganathan

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Parthasarathy Ranganathan is active.

Explore More

Publication

Featured researches published by Parthasarathy Ranganathan.

international symposium on microarchitecture | 2003

Single-ISA heterogeneous multi-core architectures: the potential for processor power reduction

Rakesh Kumar; Keith I. Farkas; Norman P. Jouppi; Parthasarathy Ranganathan; Dean M. Tullsen

This paper proposes and evaluates single-ISA heterogeneous multi-core architectures as a mechanism to reduce processor power dissipation. Our design incorporates heterogeneous cores representing different points in the power/performance design space; during an applications execution, system software dynamically chooses the most appropriate core to meet specific performance and power requirements. Our evaluation of this architecture shows significant energy benefits. For an objective function that optimizes for energy efficiency with a tight performance threshold, for 14 SPEC benchmarks, our results indicate a 39% average energy reduction while only sacrificing 3% in performance. An objective function that optimizes for energy-delay with looser performance bounds achieves, on average, nearly a factor of three improvements in energy-delay product while sacrificing only 22% in performance. Energy savings are substantially more than chip-wide voltage/frequency scaling.

architectural support for programming languages and operating systems | 2008

No "power" struggles: coordinated multi-level power management for the data center

Ramya Raghavendra; Parthasarathy Ranganathan; Vanish Talwar; Zhikui Wang; Xiaoyun Zhu

Power delivery, electricity consumption, and heat management are becoming key challenges in data center environments. Several past solutions have individually evaluated different techniques to address separate aspects of this problem, in hardware and software, and at local and global levels. Unfortunately, there has been no corresponding work on coordinating all these solutions. In the absence of such coordination, these solutions are likely to interfere with one another, in unpredictable (and potentially dangerous) ways. This paper seeks to address this problem. We make two key contributions. First, we propose and validate a power management solution that coordinates different individual approaches. Using simulations based on 180 server traces from nine different real-world enterprises, we demonstrate the correctness, stability, and efficiency advantages of our solution. Second, using our unified architecture as the base, we perform a detailed quantitative sensitivity analysis and draw conclusions about the impact of different architectures, implementations, workloads, and system design choices.

IEEE Computer | 2005

Heterogeneous chip multiprocessors

Rakesh Kumar; Dean M. Tullsen; Norman P. Jouppi; Parthasarathy Ranganathan

Heterogeneous (or asymmetric) chip multiprocessors present unique opportunities for improving system throughput, reducing processor power, and mitigating Amdahls law. On-chip heterogeneity allow the processor to better match execution resources to each applications needs and to address a much wider spectrum of system loads - from low to high thread parallelism - with high efficiency.

international ifip tc networking conference | 2009

A Power Benchmarking Framework for Network Devices

Priya Mahadevan; Puneet Sharma; Sujata Banerjee; Parthasarathy Ranganathan

Energy efficiency is becoming increasingly important in the operation of networking infrastructure, especially in enterprise and data center networks. Researchers have proposed several strategies for energy management of networking devices. However, we need a comprehensive characterization of power consumption by a variety of switches and routers to accurately quantify the savings from the various power savings schemes. In this paper, we first describe the hurdles in network power instrumentation and present a power measurement study of a variety of networking gear such as hubs, edge switches, core switches, routers and wireless access points in both stand-alone mode and a production data center. We build and describe a benchmarking suite that will allow users to measure and compare the power consumed for a large set of common configurations at any switch or router of their choice. We also propose a network energy proportionality index, which is an easily measurable metric, to compare power consumption behaviors of multiple devices.

international symposium on computer architecture | 2000

Reconfigurable caches and their application to media processing

Parthasarathy Ranganathan; Sarita V. Adve; Norman P. Jouppi

High performance general-purpose processors are increasingly being used for a variety of application domains-scientific, engineering, databases, and more recently, media processing. It is therefore important to ensure that architectural features that use a significant fraction of the on-chip transistors are applicable across these different domains. For example, current processor designs often devote the largest fraction of on-chip transistors (up to 80%) to caches. Many workloads, however, do not make effective use of large caches; e,g., media processing workloads which often have streaming data access patterns and large working sets. This paper proposes a new reconfigurable cache design. This design enables the cache SRAM arrays to be dynamically divided into multiple partitions that can be used for different processor activities. These activities can benefit applications that would otherwise not use the storage allocated to large conventional caches. Our design involves relatively few modifications to conventional cache design, and analysis using a modification of the CACTI analytical model shows a small impact on cache access time. We evaluate one representative use of reconfigurable caches-instruction reuse for media processing. We find this use gives IPC improvements ranging from 1.04X to 1.20X in simulation across eight media processing benchmarks.

international conference on management of data | 2007

JouleSort: a balanced energy-efficiency benchmark

Suzanne Rivoire; Mehul A. Shah; Parthasarathy Ranganathan; Christos Kozyrakis

The energy efficiency of computer systems is an important concern in a variety of contexts. In data centers, reducing energy use improves operating cost, scalability, reliability, and other factors. For mobile devices, energy consumption directly affects functionality and usability. We propose and motivate JouleSort, an external sort benchmark, for evaluating the energy efficiency of a wide range of computer systems from clusters to handhelds. We list the criteria, challenges, and pitfalls from our experience in creating a fair energy-efficiency benchmark. Using a commercial sort, we demonstrate a JouleSort system that is over 3.5x as energy-efficient as last years estimated winner. This system is quite different from those currently used in data centers. It consists of a commodity mobile CPU and 13 laptop drives connected by server-style I/O interfaces.

international conference on autonomic computing | 2006

Weatherman: Automated, Online and Predictive Thermal Mapping and Management for Data Centers

Justin D. Moore; Jeffrey S. Chase; Parthasarathy Ranganathan

Recent advances have demonstrated the potential benefits of coordinated management of thermal load in data centers, including reduced cooling costs and improved resistance to cooling system failures. A key unresolved obstacle to the practical implementation of thermal load management is the ability to predict the effects of workload distribution and cooling configurations on temperatures within a data center enclosure. The interactions between workload, cooling and temperature are dependent on complex factors that are unique to each data center, including physical room layout, hardware power consumption and cooling capacity; this dictates an approach that formulates management policies for each data center based on these properties. We propose and evaluate a simple, flexible method to infer a detailed model of thermal behavior within a data center from a stream of instrumentation data. This data - taken during normal data center operation - includes continuous readings taken from external temperature sensors, server instrumentation and computer room air conditioning units. Experimental results from a representative data center show that automatic thermal mapping can predict accurately the heat distribution resulting from a given workload distribution and cooling configuration, thereby removing the need for static or manual configuration of thermal load management systems. We also demonstrate how our approach adapts to preserve accuracy across changes to cluster attributes that affect thermal behavior - such as cooling settings, workload distribution and power consumption.

ieee international conference on high performance computing data and analytics | 2009

GViM: GPU-accelerated virtual machines

Vishakha Gupta; Ada Gavrilovska; Karsten Schwan; Harshvardhan Kharche; Niraj Tolia; Vanish Talwar; Parthasarathy Ranganathan

The use of virtualization to abstract underlying hardware can aid in sharing such resources and in efficiently managing their use by high performance applications. Unfortunately, virtualization also prevents efficient access to accelerators, such as Graphics Processing Units (GPUs), that have become critical components in the design and architecture of HPC systems. Supporting General Purpose computing on GPUs (GPGPU) with accelerators from different vendors presents significant challenges due to proprietary programming models, heterogeneity, and the need to share accelerator resources between different Virtual Machines (VMs). To address this problem, this paper presents GViM, a system designed for virtualizing and managing the resources of a general purpose system accelerated by graphics processors. Using the NVIDIA GPU as an example, we discuss how such accelerators can be virtualized without additional hardware support and describe the basic extensions needed for resource management. Our evaluation with a Xen-based implementation of GViM demonstrate efficiency and flexibility in system usage coupled with only small performance penalties for the virtualized vs. non-virtualized solutions.

architectural support for programming languages and operating systems | 1998

Performance of database workloads on shared-memory systems with out-of-order processors

Parthasarathy Ranganathan; Kourosh Gharachorloo; Sarita V. Adve; Luiz André Barroso

Database applications such as online transaction processing (OLTP) and decision support systems (DSS) constitute the largest and fastest-growing segment of the market for multiprocessor servers. However, most current system designs have been optimized to perform well on scientific and engineering workloads. Given the radically different behavior of database workloads (especially OLTP), it is important to re-evaluate key system design decisions in the context of this important class of applications.This paper examines the behavior of database workloads on shared-memory multiprocessors with aggressive out-of-order processors, and considers simple optimizations that can provide further performance improvements. Our study is based on detailed simulations of the Oracle commercial database engine. The results show that the combination of out-of-order execution and multiple instruction issue is indeed effective in improving performance of database workloads, providing gains of 1.5 and 2.6 times over an in-order single-issue processor for OLTP and DSS, respectively. In addition, speculative techniques enable optimized implementations of memory consistency models that significantly improve the performance of stricter consistency models, bringing the performance to within 10--15% of the performance of more relaxed models.The second part of our study focuses on the more challenging OLTP workload. We show that an instruction stream buffer is effective in reducing the remaining instruction stalls in OLTP, providing a 17% reduction in execution time (approaching a perfect instruction cache to within 15%). Furthermore, our characterization shows that a large fraction of the data communication misses in OLTP exhibit migratory behavior; our preliminary results show that software prefetch and writeback/flush hints can be used for this data to further reduce execution time by 12%.

international symposium on computer architecture | 2007

Configurable isolation: building high availability systems with commodity multi-core processors

Nidhi Aggarwal; Parthasarathy Ranganathan; Norman P. Jouppi; James E. Smith

High availability is an increasingly important requirement for enterprise systems, often valued more than performance. Systems designed for high availability typically use redundant hardware for error detection and continued uptime in the event of a failure. Chip multiprocessors with an abundance of identical resources like cores, cache and interconnection networks would appear to be ideal building blocks for implementing high availability solutions on chip. However, doing so poses significant challenges with respect to error containment and faulty component replacement. Increasing silicon and transient fault rates with future technology scaling exacerbate the problem. This paper proposes a novel, cost-effective, architecture for high availability systems built from future multi-core processors. We propose a new chip multiprocessor architecture that provides configurable isolation for fault containment and component retirement, based upon cost-effective modifications to commodity designs. The design is evaluated for a state-of-the-art industrial fault model and the proposed architecture is shown to provide effective fault isolation and graceful degradation even when the failure rate is high.

Explore More