Jaideep Moses
Intel
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jaideep Moses.
international conference on parallel architectures and compilation techniques | 2007
Li Zhao; Ravi R. Iyer; Ramesh Illikkal; Jaideep Moses; Srihari Makineni; Donald Newell
As multi-core architectures flourish in the marketplace, multi-application workload scenarios (such as server consolidation) are growing rapidly. When running multiple applications simultaneously on a platform, it has been shown that contention for shared platform resources such as last-level cache can severely degrade performance and quality of service (QoS). But todays platforms do not have the capability to monitor shared cache usage accurately and disambiguate its effects on the performance behavior of each individual application. In this paper, we investigate low-overhead mechanisms for fine-grain monitoring of the use of shared cache resources along three vectors: (a) occupancy - how much space is being used and by whom, (b) interference - how much contention is present and who is being affected and (c) sharing - how are threads cooperating. We propose the CacheScouts monitoring architecture consisting of novel tagging (software-guided monitoring IDs), and sampling mechanisms (set sampling) to achieve shared cache monitoring on per application basis at low overhead (<0.1%) and with very little loss of accuracy (<5%). We also present case studies to show how CacheScouts can be used by operating systems (OS) and virtual machine monitors (VMMs) for (a) characterizing execution profiles, (b) optimizing scheduling for performance management, (c) providing QoS and (d) metering for chargeback.
international conference on supercomputing | 2009
Andrew J. Herdrich; Ramesh Illikkal; Ravi R. Iyer; Donald Newell; Vineet Chadha; Jaideep Moses
As we embrace the era of chip multi-processors (CMP), we are faced with two major architectural challenges: (i) QoS or performance management of disparate applications running on CPU cores contending for shared cache/memory resources and (ii) global/local power management techniques to stay within the overall platform constraints. The problem is exacerbated as the number of cores sharing the resources in a chip increase. In the past, researchers have proposed independent solutions for these two problems. In this paper, we show that rate-based techniques that are employed to address power management can be adapted to address cache/memory QoS issues. The basic approach is to throttle down the processing rate of a core if it is running a low-priority task and its execution is interfering with the performance of a high priority task due to platform resource contention (i.e. cache or memory contention). We evaluate two rate throttling mechanisms (clock modulation, and frequency scaling) for effectively managing the interference between applications running in a CMP platform and delivering QoS/performance management. We show that clock modulation is much more applicable to cache/memory QoS than frequency scaling and that resource monitoring along with rate control provides effective power-performance management in CMP platforms.
virtual execution environments | 2007
Vineet Chadha; Ramesh Illiikkal; Ravi R. Iyer; Jaideep Moses; Donald Newell; Renato J. O. Figueiredo
Virtualization provides levels of execution isolation and service partition that are desirable in many usage scenarios, but its associated overheads are a major impediment for wide deployment of virtualized environments. While the virtualization cost depends heavily on workloads, it has been demonstrated that the overhead is much higher with I/O intensive workloads compared to those which are compute-intensive. Unfortunately, the architectural reasons behind the I/O performance overheads are not well understood. Early research in characterizing these penalties has shown that cache misses and TLB related overheads contribute to most of I/O virtualization cost. While most of these evaluations are done using measurements, in this paper we present an execution-driven simulation based analysis methodology with symbol annotation as a means of evaluating the performance of virtualized workloads. This methodology provides detailed information at the architectural level (with a focus on cache and TLB) and allows designers to evaluate potential hardware enhancements to reduce virtualization overhead. We apply this methodology to study the network I/O performance of Xen (as a case study) in a full system simulation environment, using detailed cache and TLB models to profile and characterize software and hardware hotspots. By applying symbol annotation to the instruction flow reported by the execution driven simulator we derive function level call flow information. We follow the anatomy of I/O processing in a virtualized platform for network transmit and receive scenarios and demonstrate the impact of cache scaling and TLB size scaling on performance.
measurement and modeling of computer systems | 2009
Ravi R. Iyer; Ramesh Illikkal; Li Zhao; Don Newell; Jaideep Moses
With cloud and utility computing models gaining significant momentum, data centers are increasingly employing virtualization and consolidation as a means to support a large number of disparate applications running simultaneously on a CMP server. In such environments, it is important to meter the usage of resources by each datacenter application so that customers can be charged accordingly. In this paper, we describe a simple metering and chargeback model (pay-as-you-go) and describe a solution based on virtual platform architectures (VPA) to accurately meter visible as well as transparent resources.
international symposium on performance analysis of systems and software | 2009
Jaideep Moses; Konstantinos Aisopos; Aamer Jaleel; Ravi R. Iyer; Ramesh Illikkal; Donald Newell; Srihari Makineni
CMPs have now become mainstream and are growing in complexity with more cores, several shared resources (cache, memory, etc) and the potential for additional heterogeneous elements. In order to manage these resources, it is becoming critical to optimize the interaction between the execution environment (operating systems, virtual machine monitors, etc) and the CMP platform. Performance analysis of such OS and CMP interactions is challenging because it requires long running full-system execution-driven simulations. In this paper, we explore an alternative approach (CMPSched
modeling, analysis, and simulation on computer and telecommunication systems | 2004
Jaideep Moses; Ramesh Illikkal; Ravi R. Iyer; Ram Huggahalli; Donald Newell
im) to evaluate the interaction of OS and CMP architectures. In particular, CMPSched
design, automation, and test in europe | 2012
Konstantinos Aisopos; Jaideep Moses; Ramesh Illikkal; Ravishankar R. Iyer; Donald Newell
im is focused on evaluating techniques to address the shared cache management problem through better interaction between CMP hardware and operating system scheduling. CMPSched
ieee international conference on high performance computing data and analytics | 2007
Li Zhao; Ravi R. Iyer; Srihari Makineni; Ramesh Illikkal; Jaideep Moses; Donald Newell
im enables fast and flexible exploration of this interaction by combining the benefits of (a) binary instrumentation tools (Pin), (b) user-level scheduling tools (Linsched) and (c) simple core/cache simulators. In this paper, we describe CMPSched
Archive | 2007
Ramesh Kumar Illikkal; Ravishankar Iyer; Jaideep Moses; Don Newell; Tryggve Fossum
im in detail and present case studies showing how CMPSched
Archive | 2011
Jaideep Moses; Rameshkumar G. Illikkal; Ravishankar Iyer; Jared E. Bendt; Sadagopan Srinivasan; Andrew J. Herdrich; Ashish V. Choubal; Avinash N. Ananthakrishnan; Vijay S.R. Degalahal
im can be used to optimize OS scheduling by taking advantage of novel shared cache monitoring capabilities in the hardware. We also describe OS scheduling heuristics to improve overall system performance through resource monitoring and application classification to achieve near optimal scheduling that minimizes the effects of contention in the shared cache of a CMP platform.