Ravi R. Iyer | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ravi R. Iyer is active.

Explore More

Publication

Featured researches published by Ravi R. Iyer.

international conference on supercomputing | 2004

CQoS: a framework for enabling QoS in shared caches of CMP platforms

Ravi R. Iyer

Cache hierarchies have been traditionally designed for usage by a single application, thread or core. As multi-threaded (MT) and multi-core (CMP) platform architectures emerge and their workloads range from single-threaded and multithreaded applications to complex virtual machines (VMs), a shared cache resource will be consumed by these different entities generating heterogeneous memory access streams exhibiting different locality properties and varying memory sensitivity. As a result, conventional cache management approaches that treat all memory accesses equally are bound to result in inefficient space utilization and poor performance even for applications with good locality properties. To address this problem, this paper presents a new cache management framework (CQoS) that (1) recognizes the heterogeneity in memory access streams, (2) introduces the notion of QoS to handle the varying degrees of locality and latency sensitivity and (3) assigns and enforces priorities to streams based on latency sensitivity, locality degree and application performance needs. To achieve this, we propose CQoS options for priority classification, priority assignment and priority enforcement. We briefly describe CQoS priority classification and assignment options -- ranging from user-driven and developer-driven to compiler-detected and flow-based approaches. Our focus in this paper is on CQoS mechanisms for priority enforcement -- these include (1) selective cache allocation, (2) static/dynamic set partitioning and (3) heterogeneous cache regions. We discuss the architectural design and implementation complexity of these CQoS options. To evaluate the performance trade-offs for these options, we have modeled these CQoS options in a cache simulator and evaluated their performance in CMP platforms running network-intensive server workloads. Our simulation results show the effectiveness of our proposed options and make the case for CQoS in future multi-threaded/multi-core platforms since it improves shared cache efficiency and increases overall system performance as a result.

measurement and modeling of computer systems | 2007

QoS policies and architecture for cache/memory in CMP platforms

Ravi R. Iyer; Li Zhao; Fei Guo; Ramesh Illikkal; Srihari Makineni; Donald Newell; Yan Solihin; Lisa R. Hsu; Steven K. Reinhardt

As we enter the era of CMP platforms with multiple threads/cores on the die, the diversity of the simultaneous workloads running on them is expected to increase. The rapid deployment of virtualization as a means to consolidate workloads on to a single platform is a prime example of this trend. In such scenarios, the quality of service (QoS) that each individual workload gets from the platform can widely vary depending on the behavior of the simultaneously running workloads. While the number of cores assigned to each workload can be controlled, there is no hardware or software support in todays platforms to control allocation of platform resources such as cache space and memory bandwidth to individual workloads. In this paper, we propose a QoS-enabled memory architecture for CMP platforms that addresses this problem. The QoS-enabled memory architecture enables more cache resources (i.e. space) and memory resources (i.e. bandwidth) for high priority applications based on guidance from the operating environment. The architecture also allows dynamic resource reassignment during run-time to further optimize the performance of the high priority application with minimal degradation to low priority. To achieve these goals, we will describe the hardware/software support required in the platform as well as the operating environment (O/S and virtual machine monitor). Our evaluation framework consists of detailed platform simulation models and a QoS-enabled version of Linux. Based on evaluation experiments, we show the effectiveness of a QoS-enabled architecture and summarize key findings/trade-offs.

Proceedings of the National Academy of Sciences of the United States of America | 2010

PCNA function in the activation and strand direction of MutLα endonuclease in mismatch repair

Anna Pluciennik; Leonid Dzantiev; Ravi R. Iyer; Nicoleta Constantin; Farid A. Kadyrov; Paul Modrich

MutLα (MLH1–PMS2) is a latent endonuclease that is activated in a mismatch-, MutSα-, proliferating cell nuclear antigen (PCNA)-, replication factor C (RFC)-, and ATP-dependent manner, with nuclease action directed to the heteroduplex strand that contains a preexisting break. RFC depletion experiments and use of linear DNAs indicate that RFC function in endonuclease activation is limited to PCNA loading. Whereas nicked circular heteroduplex DNA is a good substrate for PCNA loading and for endonuclease activation on the incised strand, covalently closed, relaxed circular DNA is a poor substrate for both reactions. However, covalently closed supercoiled or bubble-containing relaxed heteroduplexes, which do support PCNA loading, also support MutLα activation, but in this case cleavage strand bias is largely abolished. Based on these findings we suggest that PCNA has two roles in MutLα function: The clamp is required for endonuclease activation, an effect that apparently involves interaction of the two proteins, and by virtue of its loading orientation, PCNA determines the strand direction of MutLα incision. These results also provide a potential mechanism for activation of mismatch repair on nonreplicating DNA, an effect that may have implications for the somatic phase of triplet repeat expansion.

international symposium on computer architecture | 2005

Direct Cache Access for High Bandwidth Network I/O

Ram Huggahalli; Ravi R. Iyer; Scott Tetrick

Recent I/O technologies such as PCI-Express and 10 Gb Ethernet enable unprecedented levels of I/O bandwidths in mainstream platforms. However, in traditional architectures, memory latency alone can limit processors from matching 10 Gb inbound network I/O traffic. We propose a platform-wide method called direct cache access (DCA) to deliver inbound I/O data directly into processor caches. We demonstrate that DCA provides a significant reduction in memory latency and memory bandwidth for receive intensive network I/O applications. Analysis of benchmarks such as SPECWeb9, TPC-W and TPC-C shows that overall benefit depends on the relative volume of I/O to memory traffic as well as the spatial and temporal relationship between processor and I/O memory accesses. A system level perspective for the efficient implementation of DCA is presented.

measurement and modeling of computer systems | 2010

Modeling virtual machine performance: challenges and approaches

Omesh Tickoo; Ravi R. Iyer; Ramesh Illikkal; Don Newell

Data centers are increasingly employing virtualization and consolidation as a means to support a large number of disparate applications running simultaneously on server platforms. However, server platforms are still being designed and evaluated based on performance modeling of a single highly parallel application or a set of homogenous work-loads running simultaneously. Since most future datacenters are expected to employ server virtualization, this paper takes a look at the challenges of modeling virtual machine (VM) performance on a datacenter server. Based on vConsolidate (a server virtualization benchmark) and latest multi-core servers, we show that the VM modeling challenge requires addressing three key problems: (a) modeling the contention of visible resources (cores, memory capacity, I/O devices, etc), (b) modeling the contention of invisible resources (shared microarchitecture resources, shared cache, shared memory bandwidth, etc) and (c) modeling overheads of virtual machine monitor (or hypervisor) implementation. We take a first step to addressing this problem by describing a VM performance modeling approach and performing a detailed case study based on the vConsolidate benchmark. We conclude by outlining outstanding problems for future work.

virtual execution environments | 2008

Characterization & analysis of a server consolidation benchmark

Padma Apparao; Ravi R. Iyer; Xiaomin Zhang; Donald Newell; Tom J. Adelmeyer

Virtualization is already becoming ubiquitous in data centers for the consolidation of multiple workloads on a single platform. However, there are very few performance studies of server consolidation workloads in the literature. In this paper, our goal is to analyze the performance characteristics of a representative server consolidation workload. To address this goal, we have carried out extensive measurement and profiling experiments of a newly proposed consolidation workload (vConsolidate). vConsolidate consists of a compute intensive workload, a web server, a mail server and a database application running simultaneously on a single platform. We start by studying the performance slowdown of each workload due to consolidation on a contemporary multi-core dual-processor Intel platform. We then look at architectural characteristics such as CPI (cycles per instruction) and L2 MP (L2 misses per instruction) I, and analyze the benefits of larger caches for such a consolidated workload. We estimate the virtualization overheads for events such as context switches, interrupts and page faults and show how these impact the performance of the workload in consolidation. Finally, we also present the execution profile of the server consolidation workload and illustrate the life of each VM in the consolidated environment. We conclude by presenting an approach to developing a preliminary performance model based on the performance.

international conference on parallel architectures and compilation techniques | 2007

CacheScouts: Fine-Grain Monitoring of Shared Caches in CMP Platforms

Li Zhao; Ravi R. Iyer; Ramesh Illikkal; Jaideep Moses; Srihari Makineni; Donald Newell

As multi-core architectures flourish in the marketplace, multi-application workload scenarios (such as server consolidation) are growing rapidly. When running multiple applications simultaneously on a platform, it has been shown that contention for shared platform resources such as last-level cache can severely degrade performance and quality of service (QoS). But todays platforms do not have the capability to monitor shared cache usage accurately and disambiguate its effects on the performance behavior of each individual application. In this paper, we investigate low-overhead mechanisms for fine-grain monitoring of the use of shared cache resources along three vectors: (a) occupancy - how much space is being used and by whom, (b) interference - how much contention is present and who is being affected and (c) sharing - how are threads cooperating. We propose the CacheScouts monitoring architecture consisting of novel tagging (software-guided monitoring IDs), and sampling mechanisms (set sampling) to achieve shared cache monitoring on per application basis at low overhead (<0.1%) and with very little loss of accuracy (<5%). We also present case studies to show how CacheScouts can be used by operating systems (OS) and virtual machine monitors (VMMs) for (a) characterizing execution profiles, (b) optimizing scheduling for performance management, (c) providing QoS and (d) metering for chargeback.

Computer Networks | 2009

VM3: Measuring, modeling and managing VM shared resources

Ravi R. Iyer; Ramesh Illikkal; Omesh Tickoo; Li Zhao; Padma Apparao; Don Newell

With cloud and utility computing models gaining significant momentum, data centers are increasingly employing virtualization and consolidation as a means to support a large number of disparate applications running simultaneously on a chip-multiprocessor (CMP) server. In such environments, contention for shared platform resources (CPU cores, shared cache space, shared memory bandwidth, etc.) can have a significant effect on each virtual machines performance. In this paper, we investigate the shared resource contention problem for virtual machines by: (a) measuring the effects of shared platform resources on virtual machine performance, (b) proposing a model for estimating shared resource contention effects, and (c) proposing a transition from a virtual machine (VM) to a virtual platform architecture (VPA) that enables transparent shared resource management through architectural mechanisms for monitoring and enforcement. Our measurement and modeling experiments are based on a consolidation benchmark (vConsolidate) running on a state-of-the-art CMP server. Our virtual platform architecture experiments are based on detailed simulations of consolidation scenarios. Through detailed measurements and simulations, we show that shared resource contention affects virtual machine performance significantly and emphasize that virtual platform architectures is a must for future virtualized datacenters.

Nucleic Acids Research | 2009

Mismatch repair and nucleotide excision repair proteins cooperate in the recognition of DNA interstrand crosslinks

Junhua Zhao; Aklank Jain; Ravi R. Iyer; Paul Modrich; Karen M. Vasquez

DNA interstrand crosslinks (ICLs) are among the most cytotoxic types of DNA damage, thus ICL-inducing agents such as psoralen, are clinically useful chemotherapeutics. Psoralen-modified triplex-forming oligonucleotides (TFOs) have been used to target ICLs to specific genomic sites to increase the selectivity of these agents. However, how TFO-directed psoralen ICLs (Tdp-ICLs) are recognized and processed in human cells is unclear. Previously, we reported that two essential nucleotide excision repair (NER) protein complexes, XPA–RPA and XPC–RAD23B, recognized ICLs in vitro, and that cells deficient in the DNA mismatch repair (MMR) complex MutSβ were sensitive to psoralen ICLs. To further investigate the role of MutSβ in ICL repair and the potential interaction between proteins from the MMR and NER pathways on these lesions, we performed electrophoretic mobility-shift assays and chromatin immunoprecipitation analysis of MutSβ and NER proteins with Tdp-ICLs. We found that MutSβ bound to Tdp-ICLs with high affinity and specificity in vitro and in vivo, and that MutSβ interacted with XPA–RPA or XPC–RAD23B in recognizing Tdp-ICLs. These data suggest that proteins from the MMR and NER pathways interact in the recognition of ICLs, and provide a mechanistic link by which proteins from multiple repair pathways contribute to ICL repair.

international conference on computer design | 2007

Exploring DRAM cache architectures for CMP server platforms

Li Zhao; Ravi R. Iyer; Ramesh Illikkal; Donald Newell

As dual-core and quad-core processors arrive in the marketplace, the momentum behind CMP architectures continues to grow strong. As more and more cores/threads are placed on-die, the pressure on the memory subsystem is rapidly increasing. To address this issue, we explore DRAM cache architectures for CMP platforms. In this paper, we investigate the impact of introducing a low latency, large capacity and high bandwidth DRAM-based cache between the last level SRAM cache and memory subsystem. We first show the potential benefits of large DRAM caches for key commercial server workloads. As the primary hurdle to achieving these benefits with DRAM caches is the tag space overheads associated with them, we identify the most efficient DRAM cache organization and investigate various options. Our results show that the combination of 8-bit partial tags and 2-way sectoring achieves the highest performance (20% to 70%) with the lowest tag space (<25%) overhead.

Explore More