Ramesh Illikkal | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ramesh Illikkal is active.

Explore More

Publication

Featured researches published by Ramesh Illikkal.

measurement and modeling of computer systems | 2007

QoS policies and architecture for cache/memory in CMP platforms

Ravi R. Iyer; Li Zhao; Fei Guo; Ramesh Illikkal; Srihari Makineni; Donald Newell; Yan Solihin; Lisa R. Hsu; Steven K. Reinhardt

As we enter the era of CMP platforms with multiple threads/cores on the die, the diversity of the simultaneous workloads running on them is expected to increase. The rapid deployment of virtualization as a means to consolidate workloads on to a single platform is a prime example of this trend. In such scenarios, the quality of service (QoS) that each individual workload gets from the platform can widely vary depending on the behavior of the simultaneously running workloads. While the number of cores assigned to each workload can be controlled, there is no hardware or software support in todays platforms to control allocation of platform resources such as cache space and memory bandwidth to individual workloads. In this paper, we propose a QoS-enabled memory architecture for CMP platforms that addresses this problem. The QoS-enabled memory architecture enables more cache resources (i.e. space) and memory resources (i.e. bandwidth) for high priority applications based on guidance from the operating environment. The architecture also allows dynamic resource reassignment during run-time to further optimize the performance of the high priority application with minimal degradation to low priority. To achieve these goals, we will describe the hardware/software support required in the platform as well as the operating environment (O/S and virtual machine monitor). Our evaluation framework consists of detailed platform simulation models and a QoS-enabled version of Linux. Based on evaluation experiments, we show the effectiveness of a QoS-enabled architecture and summarize key findings/trade-offs.

measurement and modeling of computer systems | 2010

Modeling virtual machine performance: challenges and approaches

Omesh Tickoo; Ravi R. Iyer; Ramesh Illikkal; Don Newell

Data centers are increasingly employing virtualization and consolidation as a means to support a large number of disparate applications running simultaneously on server platforms. However, server platforms are still being designed and evaluated based on performance modeling of a single highly parallel application or a set of homogenous work-loads running simultaneously. Since most future datacenters are expected to employ server virtualization, this paper takes a look at the challenges of modeling virtual machine (VM) performance on a datacenter server. Based on vConsolidate (a server virtualization benchmark) and latest multi-core servers, we show that the VM modeling challenge requires addressing three key problems: (a) modeling the contention of visible resources (cores, memory capacity, I/O devices, etc), (b) modeling the contention of invisible resources (shared microarchitecture resources, shared cache, shared memory bandwidth, etc) and (c) modeling overheads of virtual machine monitor (or hypervisor) implementation. We take a first step to addressing this problem by describing a VM performance modeling approach and performing a detailed case study based on the vConsolidate benchmark. We conclude by outlining outstanding problems for future work.

international conference on parallel architectures and compilation techniques | 2007

CacheScouts: Fine-Grain Monitoring of Shared Caches in CMP Platforms

Li Zhao; Ravi R. Iyer; Ramesh Illikkal; Jaideep Moses; Srihari Makineni; Donald Newell

As multi-core architectures flourish in the marketplace, multi-application workload scenarios (such as server consolidation) are growing rapidly. When running multiple applications simultaneously on a platform, it has been shown that contention for shared platform resources such as last-level cache can severely degrade performance and quality of service (QoS). But todays platforms do not have the capability to monitor shared cache usage accurately and disambiguate its effects on the performance behavior of each individual application. In this paper, we investigate low-overhead mechanisms for fine-grain monitoring of the use of shared cache resources along three vectors: (a) occupancy - how much space is being used and by whom, (b) interference - how much contention is present and who is being affected and (c) sharing - how are threads cooperating. We propose the CacheScouts monitoring architecture consisting of novel tagging (software-guided monitoring IDs), and sampling mechanisms (set sampling) to achieve shared cache monitoring on per application basis at low overhead (<0.1%) and with very little loss of accuracy (<5%). We also present case studies to show how CacheScouts can be used by operating systems (OS) and virtual machine monitors (VMMs) for (a) characterizing execution profiles, (b) optimizing scheduling for performance management, (c) providing QoS and (d) metering for chargeback.

Computer Networks | 2009

VM3: Measuring, modeling and managing VM shared resources

Ravi R. Iyer; Ramesh Illikkal; Omesh Tickoo; Li Zhao; Padma Apparao; Don Newell

With cloud and utility computing models gaining significant momentum, data centers are increasingly employing virtualization and consolidation as a means to support a large number of disparate applications running simultaneously on a chip-multiprocessor (CMP) server. In such environments, contention for shared platform resources (CPU cores, shared cache space, shared memory bandwidth, etc.) can have a significant effect on each virtual machines performance. In this paper, we investigate the shared resource contention problem for virtual machines by: (a) measuring the effects of shared platform resources on virtual machine performance, (b) proposing a model for estimating shared resource contention effects, and (c) proposing a transition from a virtual machine (VM) to a virtual platform architecture (VPA) that enables transparent shared resource management through architectural mechanisms for monitoring and enforcement. Our measurement and modeling experiments are based on a consolidation benchmark (vConsolidate) running on a state-of-the-art CMP server. Our virtual platform architecture experiments are based on detailed simulations of consolidation scenarios. Through detailed measurements and simulations, we show that shared resource contention affects virtual machine performance significantly and emphasize that virtual platform architectures is a must for future virtualized datacenters.

international conference on computer design | 2007

Exploring DRAM cache architectures for CMP server platforms

Li Zhao; Ravi R. Iyer; Ramesh Illikkal; Donald Newell

As dual-core and quad-core processors arrive in the marketplace, the momentum behind CMP architectures continues to grow strong. As more and more cores/threads are placed on-die, the pressure on the memory subsystem is rapidly increasing. To address this issue, we explore DRAM cache architectures for CMP platforms. In this paper, we investigate the impact of introducing a low latency, large capacity and high bandwidth DRAM-based cache between the last level SRAM cache and memory subsystem. We first show the potential benefits of large DRAM caches for key commercial server workloads. As the primary hurdle to achieving these benefits with DRAM caches is the tag space overheads associated with them, we identify the most efficient DRAM cache organization and investigate various options. Our results show that the combination of 8-bit partial tags and 2-way sectoring achieves the highest performance (20% to 70%) with the lowest tag space (<25%) overhead.

ACM Sigarch Computer Architecture News | 2007

From chaos to QoS: case studies in CMP resource management

Fei Guo; Hari Kannan; Li Zhao; Ramesh Illikkal; Ravi R. Iyer; Don Newell; Yan Solihin; Christos Kozyrakis

As more and more cores are enabled on the die of future CMP platforms, we expect that several diverse workloads will run simultaneously on the platform. A key example of this trend is the growth of virtualization usage models. When multiple virtual machines or applications or threads run simultaneously, the quality of service (QoS) that the platform provides to each individual thread is non-deterministic today. This occurs because the simultaneously running threads place very different demands on the shared resources (cache space, memory bandwidth, etc) in the platform and in most cases contend with each other. In this paper, we first present case studies that show how this results in non-deterministic performance. Unlike the compute resources managed through scheduling, platform resource allocation to individual threads cannot be controlled today. In order to provide better determinism and QoS, we then examine resource management mechanisms and present QoS-aware architectures and execution environments. The main contribution of this paper is the architecture feasibility analysis through prototypes that allow experimentation with QoS-Aware execution environments and architectural resources. We describe these QoS prototypes and then present preliminary case studies of multi-tasking and virtualization usage models sharing one critical CMP resource (last-level cache). We then demonstrate how proper management of the cache resource can provide service differentiation and deterministic performance behavior when running disparate workloads in future CMP platforms.

international conference on supercomputing | 2009

Rate-based QoS techniques for cache/memory in CMP platforms

Andrew J. Herdrich; Ramesh Illikkal; Ravi R. Iyer; Donald Newell; Vineet Chadha; Jaideep Moses

As we embrace the era of chip multi-processors (CMP), we are faced with two major architectural challenges: (i) QoS or performance management of disparate applications running on CPU cores contending for shared cache/memory resources and (ii) global/local power management techniques to stay within the overall platform constraints. The problem is exacerbated as the number of cores sharing the resources in a chip increase. In the past, researchers have proposed independent solutions for these two problems. In this paper, we show that rate-based techniques that are employed to address power management can be adapted to address cache/memory QoS issues. The basic approach is to throttle down the processing rate of a core if it is running a low-priority task and its execution is interfering with the performance of a high priority task due to platform resource contention (i.e. cache or memory contention). We evaluate two rate throttling mechanisms (clock modulation, and frequency scaling) for effectively managing the interference between applications running in a CMP platform and delivering QoS/performance management. We show that clock modulation is much more applicable to cache/memory QoS than frequency scaling and that resource monitoring along with rate control provides effective power-performance management in CMP platforms.

Operating Systems Review | 2011

Efficient interaction between OS and architecture in heterogeneous platforms

Sadagopan Srinivasan; Li Zhao; Ramesh Illikkal; Ravishankar R. Iyer

Almost all hardware platforms to date have been homogeneous with one or more identical processors managed by the operating system (OS). However, recently, it has been recognized that power constraints and the need for domain-specific high performance computing may lead architects towards building heterogeneous architectures and platforms in the near future. In this paper, we consider the three types of heterogeneous core architectures: (a) Virtual asymmetric cores: multiple processors that have identical core micro-architectures and ISA but each running at a different frequency point or perhaps having a different cache size, (b) Physically asymmetric cores: heterogeneous cores, each with a fundamentally different microarchitecture (in-order vs. out-of-order for instance) running at similar or different frequencies, with identical ISA and (c) Hybrid cores: multiple cores, where some cores have tightly-coupled hardware accelerators or special functional units. We show case studies that highlight why existing OS and hardware interaction in such heterogeneous architectures is inefficient and causes loss in application performance, throughput efficiency and lack of quality of service. We then discuss hardware and software support needed to address these challenges in heterogeneous platforms and establish efficient heterogeneous environments for platforms in the next decade. In particular, we will outline a monitoring and prediction framework for heterogeneity along with software support to take advantage of this information. Based on measurements on real platforms, we will show that these proposed techniques can provide significant advantage in terms of performance and power efficiency in heterogeneous platforms.

international symposium on microarchitecture | 2012

Leveraging Heterogeneity in DRAM Main Memories to Accelerate Critical Word Access

Niladrish Chatterjee; Manjunath Shevgoor; Rajeev Balasubramonian; Al Davis; Zhen Fang; Ramesh Illikkal; Ravi R. Iyer

The DRAM main memory system in modern servers is largely homogeneous. In recent years, DRAM manufacturers have produced chips with vastly differing latency and energy characteristics. This provides the opportunity to build a heterogeneous main memory system where different parts of the address space can yield different latencies and energy per access. The limited prior work in this area has explored smart placement of pages with high activities. In this paper, we propose a novel alternative to exploit DRAM heterogeneity. We observe that the critical word in a cache line can be easily recognized beforehand and placed in a low-latency region of the main memory. Other non-critical words of the cache line can be placed in a low-energy region. We design an architecture that has low complexity and that can accelerate the transfer of the critical word by tens of cycles. For our benchmark suite, we show an average performance improvement of 12.9% and an accompanying memory energy reduction of 15%.

measurement and modeling of computer systems | 2009