Kunal Rao
Princeton University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Kunal Rao.
high performance distributed computing | 2013
Srihari Cadambi; Giuseppe Coviello; Cheng-Hong Li; Rajat Phull; Kunal Rao; Murugan Sankaradass; Srimat T. Chakradhar
It is remarkably easy to offload processing to Intels newest manycore coprocessor, the Xeon-Phi: it supports a popular ISA (x86-based), a popular OS (Linux) and a popular programming model (OpenMP). Unfortunately, easy portability does not automatically ensure high performance. Additional programmer effort is necessary to leverage the new performance-oriented hardware features. But programmer optimizations alone are insufficient. Multiprocessing is also necessary to improve hardware utilization, and Linux makes it easy for processes to share the manycore coprocessor. However multiprocessing inefficiencies can easily offset gains made by the programmer. Our experiments on a production, high-performance Xeon server with multiple Xeon Phi coprocessors show that multiprocessing on coprocessors not only slows down the processes but also introduces unreliability (some processes crash unexpectedly). We propose a new, user-level middleware called COSMIC that improves performance and reliability of multiprocessing on coprocessors like the Xeon Phi. COSMIC seamlessly fits in the existing Xeon Phi software stack and is transparent to programmers. It manages Xeon Phi processes that execute parallel regions offloaded to the coprocessors. Offloads typically have programmer-driven performance directives like thread and affinity requirements. Unlike the existing Xeon Phi software stack, COSMIC does fair scheduling of both processes and offloads, and takes into account conflicting requirements of offloads belonging to different processes. By doing so, COSMIC has two clear benefits. First, it improves multiprocessing performance by preventing thread and memory oversubscription, by avoiding inter-offload interference and by reducing load imbalance on coprocessors and cores. Second, it increases multiprocessing reliability by exploiting programmer-specified per-process coprocessor memory requirements to completely avoid memory oversubscription and crashes. Our experiments on several representative Xeon Phi workloads show that, in a multiprocessing environment, COSMIC improves average core utilization by up to 3 times, reduces make-span by up to 52%, reduces average process latency (turn-around-time) by 70%, and completely eliminates process crashes.
high performance distributed computing | 2012
Rajat Phull; Cheng-Hong Li; Kunal Rao; Hari Cadambi; Srimat T. Chakradhar
Archive | 2014
Srihari Cadambi; Kunal Rao; Srimat T. Chakradhar; Rajat Phull; Giuseppe Coviello; Murugan Sankaradass; Cheng-Hong Li
Archive | 2013
Srihari Cadambi; Kunal Rao; Srimat T. Chakradhar; Rajat Phull; Giuseppe Coviello; Murugan Sankaradass; Cheng-Hong Li
international conference on cluster computing | 2011
M. Mustafa Rafique; Srihari Cadambi; Kunal Rao; Ali Raza Butt; Srimat T. Chakradhar
Archive | 2016
Kunal Rao; Giuseppe Coviello; Srimat T. Chakradhar; Souvik Bhattacherjee; Srihari Cadambi
Archive | 2013
Srihari Cadambi; Kunal Rao; Srimat T. Chakradhar; Rajat Phull; Giuseppe Coviello; Murugan Sankaradass; Cheng-Hong Li
Archive | 2016
Murugan Sankaradas; Kunal Rao; Srimat T. Chakradhar
Archive | 2016
Kunal Rao; Giuseppe Coviello; Srimat T. Chakradhar; Souvik Bhattacherjee; Srihari Cadambi
Archive | 2015
Cheng-Hong Li; Giuseppe Coviello; Kunal Rao; Murugan Sankaradas; Srihari Cadambi; Srimat T. Chakradhar; Rajat Phull