Nishkam Ravi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Nishkam Ravi is active.

Explore More

Publication

Featured researches published by Nishkam Ravi.

symposium on code generation and optimization | 2012

Panacea: towards holistic optimization of MapReduce applications

Jun Liu; Nishkam Ravi; Srimat T. Chakradhar; Mahmut T. Kandemir

MapReduce has emerged as one of the most popular programming models for data parallel enterprise applications. Despite advances in runtime, the opportunities for optimizing MapReduce applications remain largely unexplored. In this paper, we present a framework for performing holistic compiler optimizations on legacy MapReduce applications. We have identified and implemented two optimizations and evaluated them with a set of Hadoop applications on a cluster of Xeon servers. Our experiments show that performance gains of more than 3X can be achieved without user involvement.

international conference on supercomputing | 2012

Apricot: an optimizing compiler and productivity tool for x86-compatible many-core coprocessors

Nishkam Ravi; Yi Yang; Tao Bao; Srimat T. Chakradhar

Intel MIC (Many Integrated Core) is the first x86-based coprocessor architecture aimed at accelerating multi-core HPC applications. In the most common usage model, parallel code sections are offloaded to the MIC coprocessor using LEO (Language Extensions for Offload). The developer is responsible for identifying and specifying offloadable code regions, managing data transfers between the CPU and MIC and optimizing the application for performance, which requires some amount of effort and experimentation. In this paper, we present Apricot, an optimizing compiler and productivity tool for x86-compatible many-core coprocessors (such as Intel MIC) that minimizes developer effort by (i) automatically inserting LEO clauses for parallelizable code regions, (ii) selectively offloading some of the code regions to the coprocessor at runtime based on a cost model that we have developed, (iii) applying a set ofoptimizations for minimizing the data communication overhead and improving overall performance. Apricot is intended to assist programmers in porting existing multi-core applications and writing new ones to take advantage of the many-core coprocessor, while maximizing overall performance. Experiments with SpecOMP and NAS Parallel benchmarks show that Apricot can successfully transform OpenMP applications to run on the MIC coprocessor with good performance gains.

2011 International Green Computing Conference and Workshops | 2011

Power management for heterogeneous clusters: An experimental study

M. Mustafa Rafique; Nishkam Ravi; Srihari Cadambi; Ali Raza Butt; Srimat T. Chakradhar

Reducing energy consumption has a significant role in mitigating the total cost of ownership of computing clusters. Building heterogeneous clusters by combining high-end and low-end server nodes (e.g., Xeons and Atoms) is a recent trend towards achieving energy-efficient computing. This requires a cluster-level power manager that has the ability to predict future load, and server nodes that can quickly transition between active and low-power sleep states. In practice however, the load is unpredictable and often punctuated by spikes, necessitating a number of extra “idling” servers. We design a cluster-level power manager that (1) identifies the optimal cluster configuration based on the power profiles of servers and workload characteristics, and (2) maximizes work done per watt by assigning P-states and S-states to the cluster servers dynamically based on current request rate. We carry out an experimental study on a web server cluster composed of high-end Xeon servers and low-end Atom-based Netbooks and share our findings.

languages and compilers for parallel computing | 2015

Automatic and Efficient Data Host-Device Communication for Many-Core Coprocessors

Bin Ren; Nishkam Ravi; Yi Yang; Min Feng; Gagan Agrawal; Srimat T. Chakradhar

Orchestrating data transfers between CPU and a coprocessor manually is cumbersome, particularly for multi-dimensional arrays and other data structures with multi-level pointers common in scientific computations. This paper describes a system that includes both compile-time and runtime solutions for this problem, with the overarching goal of improving programmer productivity while maintaining performance. We find that the standard linearization method performs poorly for non-uniform dimensions on the coprocessor due to redundant data transfers and suppression of important compiler optimizations such as vectorization. The key contribution of this paper is a novel approach for heap linearization that avoids modifying memory accesses to enable vectorization, referred to as partial linearization with pointer reset. We implement partial linearization with pointer reset as the compile time solution, whereas runtime solution is implemented as an enhancement to MYO library. We evaluate our approach with respect to multiple C benchmarks. Experimental results demonstrate that our best compile-time solution can perform 2.5x-5x faster than original runtime solution, and the CPU-MIC code with it can achieve 1.5x-2.5x speedup over the 16-thread CPU version.

ieee international conference on high performance computing data and analytics | 2013

Semi-automatic restructuring of offloadable tasks for many-core accelerators

Nishkam Ravi; Yi Yang; Tao Bao; Srimat T. Chakradhar

Work division between the processor and accelerator is a common theme in modern heterogenous computing. Recent efforts (such as LEO and OpenAcc) provide directives that allow the developer to mark code regions in the original application from which offloadable tasks can be generated by the compiler. Auto-tuners and runtime schedulers work with the options (i.e., offloadable tasks) generated at compile time, which is limited by the directives specified by the developer. There is no provision for offload restructuring.

international conference on supercomputing | 2014

Automating and optimizing data transfers for many-core coprocessors

Bin Ren; Nishkam Ravi; Yi Yang; Min Feng; Gagan Agrawal; Srimat T. Chakradhar

Orchestrating data transfers between CPUs and a coprocessor manually is cumbersome, particularly for multi-dimensional arrays and other data structures with multi-level pointers, which are common in scientific computations. This work describes a system that includes both compile-time and runtime solutions for this problem, with the overarching goal of improving programmer productivity while maintaining performance. We implemented our best compile-time solution, partial linearization with pointer reset, as a source-to-source transformation, and evaluated our work by multiple C benchmarks. Our experiment results demonstrate that our best compile-time solution can perform 2.5x-5x faster than original runtime solution, and the CPU-Coprocessor code with it can achieve 1.5x-2.5x speedup over the 16-thread CPU version.

Archive | 2012