Janki Bhimani | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Janki Bhimani is active.

Explore More

Publication

Featured researches published by Janki Bhimani.

international conference on cloud computing | 2017

FIM: Performance Prediction for Parallel Computation in Iterative Data Processing Applications

Janki Bhimani; Ningfang Mi; Miriam Leeser; Zhengyu Yang

Predicting performance of an application running on high performance computing (HPC) platforms in a cloud environment is increasingly becoming important because of its influence on development time and resource management. However, predicting the performance with respect to parallel processes is complex for iterative, multi-stage applications. This research proposes a performance approximation approach FiM to model the computing performance of iterative, multi-stage applications running on a master-compute framework. FiM consists of two key components that are coupled with each other: 1) Stochastic Markov Model to capture non-deterministic runtime that often depends on parallel resources, e.g., number of processes. 2) Machine Learning Model that extrapolates the parameters for calibrating our Markov model when we have changes in application parameters such as dataset. Our new modeling approach considers different design choices along multiple dimensions, namely (i) process level parallelism, (ii) distribution of cores on multi-core processors in cloud computing, (iii) application related parameters, and (iv) characteristics of datasets. The major contribution of our prediction approach is that FiM is able to provide an accurate prediction of parallel computation time for the datasets which have much larger size than that of the training datasets. Such calculation prediction provides data analysts a useful insight of optimal configuration of parallel resources (e.g., number of processes and number of cores) and also helps system designers to investigate the impact of changes in application parameters on system performance.

international conference on computer communications and networks | 2017

AutoPath: Harnessing Parallel Execution Paths for Efficient Resource Allocation in Multi-Stage Big Data Frameworks

Han Gao; Zhengyu Yang; Janki Bhimani; Teng Wang; Jiayin Wang; Bo Sheng; Ningfang Mi

Due to the ﬂexibility of data operations and scalability of in- memory cache, Spark has revealed the potential to become the standard distributed framework to replace Hadoop for data-intensive processing in both industry and academia. However, we observe that the built-in scheduling algorithms in Spark (i.e., FIFO and FAIR) are not optimized for the applications with multiple parallel and independent branches in stages. Speciﬁcally, the child stage needs to wait and collect data from all its parent branches, but this wait has no guaranteed upper bound since it is tightly coupled with each branchs workload characteristic, stage order, and their corresponding allocated computing resource. To address this challenge, we investigate a superior solution which ensures all branches acquire suitable resources according to their workload demand in order to let the ﬁnish time of each branch be as close as possible. Based on this, we propose a novel scheduling policy, named AutoPath, which can effectively reduce the overall makespan of such kind of applications by detecting and leveraging the parallel path, and adaptively assigning computing resources based on the estimated workload demands during runtime. We implemented the new scheduling scheme in Spark v1.5.0 and evaluated it with selected representative workloads. The experiments demonstrate that our new scheduler effectively reduces the makespan and improves resource utilizations for these applications, compared to the current FIFO and FAIR schedulers.

international performance computing and communications conference | 2016

GReM: Dynamic SSD resource allocation in virtualized storage systems with heterogeneous IO workloads

Zhengyu Yang; Jianzhe Tai; Janki Bhimani; Jiayin Wang; Ningfang Mi; Bo Sheng

In a shared virtualized storage system that runs VMs with heterogeneous IO demands, it becomes a problem for the hypervisor to cost-effectively partition and allocate SSD resources among multiple VMs. There are two straightforward approaches to solving this problem: equally assigning SSDs to each VM or managing SSD resources in a fair competition mode. Unfortunately, neither of these approaches can fully utilize the benefits of SSD resources, particularly when the workloads frequently change and bursty IOs occur from time to time. In this paper, we design a Global SSD Resource Management solution — GReM, which aims to fully utilize SSD resources as a second-level cache under the consideration of performance isolation. In particular, GReM takes dynamic IO demands of all VMs into consideration to split the entire SSD space into a long-term zone and a short-term zone, and cost-effectively updates the content of SSDs in these two zones. GReM is able to adaptively adjust the reservation for each VM inside the long-term zone based on their IO changes. GReM can further dynamically partition SSDs between the long- and short-term zones during runtime by leveraging the feedbacks from both cache performance and bursty workloads. Experimental results show that GReM can capture the cross-VM IO changes to make correct decisions on resource allocation, and thus obtain high IO hit ratio and low IO management costs, compared with both traditional and state-of-the-art caching algorithms.

international performance computing and communications conference | 2016

Understanding performance of I/O intensive containerized applications for NVMe SSDs

Janki Bhimani; Jingpei Yang; Zhengyu Yang; Ningfang Mi; Qiumin Xu; Manu Awasthi; Rajinikanth Pandurangan; Vijay Balakrishnan

Our cloud-based IT world is founded on hyper-visors and containers. Containers are becoming an important cornerstone, which is increasingly used day-by-day. Among different available frameworks, docker has become one of the major adoptees to use containerized platform in data centers and enterprise servers, due to its ease of deploying and scaling. Further more, the performance benefits of a lightweight container platform can be leveraged even more with a fast back-end storage like high performance SSDs. However, increase in number of simultaneously operating docker containers may not guarantee an aggregated performance improvement due to saturation. Thus, understanding performance bottleneck in a multi-tenancy docker environment is critically important to maintain application level fairness and perform better resource management. In this paper, we characterize the performance of persistent storage option (through data volume) for I/O intensive, dockerized applications. Our work investigates the impact on performance with increasing number of simultaneous docker containers in different workload environments. We provide, first of its kind study of I/O intensive containerized applications operating with NVMe SSDs. We show that 1) a six times better application throughput can be obtained, just by wise selection of number of containerized instances compared to single instance; and 2) for multiple application containers running simultaneously, an application throughput may degrade upto 50% compared to a stand-alone applications throughput, if good choice of application and workload is not made. We then propose novel design guidelines for an optimal and fair operation of both homogeneous and heterogeneous environments mixed with different applications and workloads.

ieee high performance extreme computing conference | 2017

Accelerating big data applications using lightweight virtualization framework on enterprise cloud

Janki Bhimani; Zhengyu Yang; Miriam Leeser; Ningfang Mi

Hypervisor-based virtualization technology has been successfully used to deploy high-performance and scalable infrastructure for Hadoop, and now Spark applications. Container-based virtualization techniques are becoming an important option, which is increasingly used due to their lightweight operation and better scaling when compared to Virtual Machines (VM). With containerization techniques such as Docker becoming mature and promising better performance, we can use Docker to speed-up big data applications. However, as applications have different behaviors and resource requirements, before replacing traditional hypervisor-based virtual machines with Docker, it is important to analyze and compare performance of applications running in the cloud with VMs and Docker containers. VM provides distributed resource management for different virtual machines running with their own allocated resources, while Docker relies on shared pool of resources among all containers. Here, we investigate the performance of different Apache Spark applications using both Virtual Machines (VM) and Docker containers. While others have looked at Dockers performance, this is the first study that compares these different virtualization frameworks for a big data enterprise cloud environment using Apache Spark. In addition to makespan and execution time, we also analyze different resource utilization (CPU, disk, memory, etc.) by Spark applications. Our results show that Spark using Docker can obtain speed-up of over 10 times when compared to using VM. However, we observe that this may not apply to all applications due to different workload patterns and different resource management schemes performed by virtual machines and containers. Our work can guide application developers, system administrators and researchers to better design and deploy big data applications on their platforms to improve the overall performance.

ieee high performance extreme computing conference | 2015

Accelerating K-Means clustering with parallel implementations and GPU computing

Janki Bhimani; Miriam Leeser; Ningfang Mi

K-Means clustering is a popular unsupervised machine learning method which has been used in diverse applications including image processing, information retrieval, social sciences and weather forecasting. However, clustering is computationally expensive especially when applied to large datasets. In this paper, we explore accelerating the performance of K-means clustering using three approaches: 1) shared memory using OpenMP, 2) distributed memory with message passing (MPI), and 3) heterogeneous computing with NVIDIA Graphics Processing Units (GPUs) programmed with CUDA-C. While others have looked at accelerating K-means clustering, this is the first study that compares these different approaches. In addition, K-means performance is very sensitive to the initial means chosen. We evaluate different initializations in parallel and choose the best one to use for the entire algorithm. We evaluate results on a range of images from small (300×300 pixels) to large (1164×1200 pixel). Our results show that all three parallel programming approaches give speed-up, with the best results obtained by OpenMP for smaller images and CUDA-C for larger ones. Each of these approaches gives approximately thirty times overall speed-up compared to a sequential implementation of K-means. In addition, our parallel initialization gives an additional 1.5 to 2.5 times speed-up over the accelerated parallel versions.

ieee high performance extreme computing conference | 2016

Design space exploration of GPU Accelerated cluster systems for optimal data transfer using PCIe bus

Janki Bhimani; Miriam Leeser; Ningfang Mi

Use of accelerators such as GPUs is increasing, but efficient use of GPUs requires making good design choices. Such design choices include type of memory allocation and overlapping concurrency of data transfer with parallel computation. Performance varies with the application, hardware version such as generation of GPU, and software version including programming language drivers. This large number of design decisions makes it nearly impossible to obtain the optimal performance point by directly porting any application. This emphasizes the need for high level design decision guidelines for GPU accelerated cluster systems, applicable to a broad class of applications rather than any specific application. This paper proposes novel design guidelines for GPU accelerated cluster systems, to optimize the data transfer from host (CPU) to device (GPU) using the PCIe bus. In particular, we consider design choices offered by NVIDIA GPUs. Our main contribution is to build design guidelines that are applicable to a broad class of applications. We design 27 different versions of the same micro benchmark, where the design choices made by each version is unique. We observe that a speedup of 2.6x can be obtained just by making good design choices.

international performance computing and communications conference | 2016

Performance prediction techniques for scalable large data processing in distributed MPI systems

Janki Bhimani; Ningfang Mi; Miriam Leeser

Predicting performance of an application running on parallel computing platforms is increasingly becoming important due to the long development time of an application and the high resource management cost of parallel computing platforms. However, predicting overall performance is complex and must take into account both parallel calculation time and communication time. Difficulty in accurate performance modeling is compounded by myriad design choices along multiple dimensions, namely (i) process level parallelism, (ii) distribution of cores on multi-processor platforms, (iii) application related parameters, and (iv) characteristics of datasets. This research proposes a fast and accurate performance prediction approach to predict the calculation and communication time of an application running on a distributed computing platform. The major contribution of our prediction approach is that it can provide an accurate prediction of execution times for new datasets which have much larger sizes than the training datasets. Our approach consists of two models, i.e., a probabilistic self-learning model to predict calculation time and a simulation queuing model to predict network communication time. The combination of these two models provides data analysts a useful insight of optimal configuration of parallel resources (e.g., number of processes and number of cores) and application parameters setting.

Arabian Journal for Science and Engineering | 2018

Pelletization Characteristics of the Hydrothermal Pretreated Rice Straw with Added Binders

Xianfei Xia; Hongru Xiao; Zhengyu Yang; Xin Xie; Janki Bhimani

Pelletization of the loose rice straw is an attractive option to produce renewable fuels. In this paper, we focus on the problem of how to improve this pelletization process, especially reduce energy consumption and improve product quality. In detail, we pretreat rice straw and investigate the densification characteristics of the pretreated materials. Pretreatment methods of the materials include hydrothermal treatment and adding a certain proportion of economic additives, such as rapeseed meal, waste engine oil. The pretreated rice straw was pelletized by using a biomass densification platform, and then the energy consumption and pellet quality were tested. Experimental results indicate that the hydrothermal pretreatment plays an important role in reducing energy consumption and improving product quality, and waste engine oil has a better effect than the rapeseed meal. We also observe that the obtained pellet quality reaches the standard of middle-grade coal, and the proposed pretreatment method realizes the comprehensive utilization of waste agricultural resources.

international symposium on performance analysis of systems and software | 2017

Docker characterization on high performance SSDs

Qiumin Xu; Manu Awasthi; Krishna T. Malladi; Janki Bhimani; Jingpei Yang; Murali Annavaram

Docker containers [2] are becoming the mainstay for deploying applications in cloud platforms, having many desirable features like ease of deployment, developer friendliness and lightweight virtualization. Meanwhile, solid state disks (SSDs) have witnessed tremendous performance boost through recent innovations in industry such as Non-Volatile Memory Express (NVMe) standards [3], [4]. However, the performance of containerized applications on these high speed contemporary SSDs has not yet been investigated. In this paper, we present a characterization of the performance impact among a wide variety of the available storage options for deploying Docker containers and provide the configuration options to best utilize the high performance SSDs.

Explore More