Is this you? Create Your Porfile

Abhinav Vishnu

Pacific Northwest National Laboratory

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Abhinav Vishnu is active.

Explore More

Publication

Featured researches published by Abhinav Vishnu.

Computing | 2016

A survey and taxonomy on energy efficient resource allocation techniques for cloud computing systems

Abdul Hameed; Alireza Khoshkbarforoushha; Rajiv Ranjan; Prem Prakash Jayaraman; Joanna Kolodziej; Pavan Balaji; Sherali Zeadally; Qutaibah M. Malluhi; Nikos Tziritas; Abhinav Vishnu; Samee Ullah Khan; Albert Y. Zomaya

In a cloud computing paradigm, energy efficient allocation of different virtualized ICT resources (servers, storage disks, and networks, and the like) is a complex problem due to the presence of heterogeneous application (e.g., content delivery networks, MapReduce, web applications, and the like) workloads having contentious allocation requirements in terms of ICT resource capacities (e.g., network bandwidth, processing speed, response time, etc.). Several recent papers have tried to address the issue of improving energy efficiency in allocating cloud resources to applications with varying degree of success. However, to the best of our knowledge there is no published literature on this subject that clearly articulates the research problem and provides research taxonomy for succinct classification of existing techniques. Hence, the main aim of this paper is to identify open challenges associated with energy efficient resource allocation. In this regard, the study, first, outlines the problem and existing hardware and software-based techniques available for this purpose. Furthermore, available techniques already presented in the literature are summarized based on the energy-efficient research dimension taxonomy. The advantages and disadvantages of the existing techniques are comprehensively analyzed against the proposed research dimension taxonomy namely: resource adaption policy, objective function, allocation method, allocation operation, and interoperability.

parallel computing | 2013

A survey on resource allocation in high performance distributed computing systems

Hameed Hussain; Saif Ur Rehman Malik; Abdul Hameed; Samee Ullah Khan; Gage Bickler; Nasro Min-Allah; Muhammad Bilal Qureshi; Limin Zhang; Wang Yong-Ji; Nasir Ghani; Joanna Kolodziej; Albert Y. Zomaya; Cheng Zhong Xu; Pavan Balaji; Abhinav Vishnu; Fredric Pinel; Johnatan E. Pecero; Dzmitry Kliazovich; Pascal Bouvry; Hongxiang Li; Lizhe Wang; Dan Chen; Ammar Rayes

Classification of high performance computing (HPC) systems is provided.Current HPC paradigms and industrial application suites are discussed.State of the art in HPC resource allocation is reported.Hardware and software solutions are discussed for optimized HPC systems. An efficient resource allocation is a fundamental requirement in high performance computing (HPC) systems. Many projects are dedicated to large-scale distributed computing systems that have designed and developed resource allocation mechanisms with a variety of architectures and services. In our study, through analysis, a comprehensive survey for describing resource allocation in various HPCs is reported. The aim of the work is to aggregate under a joint framework, the existing solutions for HPC to provide a thorough analysis and characteristics of the resource management and allocation strategies. Resource allocation mechanisms and strategies play a vital role towards the performance improvement of all the HPCs classifications. Therefore, a comprehensive discussion of widely used resource allocation strategies deployed in HPC environment is required, which is one of the motivations of this survey. Moreover, we have classified the HPC systems into three broad categories, namely: (a) cluster, (b) grid, and (c) cloud systems and define the characteristics of each class by extracting sets of common attributes. All of the aforementioned systems are cataloged into pure software and hybrid/hardware solutions. The system classification is used to identify approaches followed by the implementation of existing resource allocation strategies that are widely presented in the literature.

conference on high performance computing (supercomputing) | 2004

Building Multirail InfiniBand Clusters: MPI-Level Design and Performance Evaluation

Jiuxing Liu; Abhinav Vishnu; Dhabaleswar K. Panda

In the area of cluster computing, InfiniBand is becoming increasingly popular due to its open standard and high performance. However, even with InfiniBand, network bandwidth can still become the performance bottleneck for some of today’s most demanding applications. In this paper, we study the problem of how to overcome the bandwidth bottleneck by using multirail networks. We present different ways of setting up multirail networks with InfiniBand and propose a unified MPI design that can support all these approaches. We have also discussed various important design issues and provided in-depth discussions of different policies of using multirail networks, including an adaptive striping scheme that can dynamically change the striping parameters based on current system condition. We have implemented our design and evaluated it using both microbenchmarks and applications. Our performance results show that multirail networks can significant improve MPI communication performance. With a two rail InfiniBand cluster, we have achieved almost twice the bandwidth and half the latency for large messages compared with the original MPI. At the application level, the multirail MPI can significantly reduce communication time as well as running time depending on the communication pattern. We have also shown that the adaptive striping scheme can achieve excellent performance without a priori knowledge of the bandwidth of each rail.

ieee international symposium on parallel distributed processing workshops and phd forum | 2010

Designing topology-aware collective communication algorithms for large scale InfiniBand clusters: Case studies with Scatter and Gather

Krishna Chaitanya Kandalla; Hari Subramoni; Abhinav Vishnu; Dhabaleswar K. Panda

Modern high performance computing systems are being increasingly deployed in a hierarchical fashion with multi-core computing platforms forming the base of the hierarchy. These systems are usually comprised of multiple racks, with each rack consisting of a finite number of chassis, and each chassis having multiple compute nodes or blades, based on multi-core architectures. The networks are also hierarchical with multiple levels of switches. Message exchange operations between processes that belong to different racks involve multiple hops across different switches and this directly affects the performance of collective operations. In this paper, we take on the challenges involved in detecting the topology of large scale InfiniBand clusters and leveraging this knowledge to design efficient topology-aware algorithms for collective operations. We also propose a communication model to analyze the communication costs involved in collective operations on large scale supercomputing systems. We have analyzed the performance characteristics of two collectives, MPI_Gather and MPI_Scatter, on such systems and we have proposed topology-aware algorithms for these operations. Our experimental results have shown that the proposed algorithms can improve the performance of these collective operations by almost 54% at the micro-benchmark level.

Journal of Computational Chemistry | 2017

Deep learning for computational chemistry

Garrett B. Goh; Nathan O. Hodas; Abhinav Vishnu

The rise and fall of artificial neural networks is well documented in the scientific literature of both computer science and computational chemistry. Yet almost two decades later, we are now seeing a resurgence of interest in deep learning, a machine learning algorithm based on multilayer neural networks. Within the last few years, we have seen the transformative impact of deep learning in many domains, particularly in speech recognition and computer vision, to the extent that the majority of expert practitioners in those field are now regularly eschewing prior established models in favor of deep learning models. In this review, we provide an introductory overview into the theory of deep neural networks and their unique properties that distinguish them from traditional machine learning algorithms used in cheminformatics. By providing an overview of the variety of emerging applications of deep neural networks, we highlight its ubiquity and broad applicability to a wide range of challenges in the field, including quantitative structure activity relationship, virtual screening, protein structure prediction, quantum chemistry, materials design, and property prediction. In reviewing the performance of deep neural networks, we observed a consistent outperformance against non‐neural networks state‐of‐the‐art models across disparate research topics, and deep neural network‐based models often exceeded the “glass ceiling” expectations of their respective tasks. Coupled with the maturity of GPU‐accelerated computing for training deep neural networks and the exponential growth of chemical data on which to train these networks on, we anticipate that deep learning algorithms will be a valuable tool for computational chemistry.

cluster computing and the grid | 2007

Hot-Spot Avoidance With Multi-Pathing Over InfiniBand: An MPI Perspective

Abhinav Vishnu; Matthew J. Koop; Adam Moody; Amith R. Mamidala; Sundeep Narravula; Dhabaleswar K. Panda

Large scale InfiniBand clusters are becoming increasingly popular, as reflected by the TOP 500 supercomputer rankings. At the same time, fat tree has become a popular interconnection topology for these clusters, since it allows multiple paths to be available in between a pair of nodes. However, even with fat tree, hot-spots may occur in the network depending upon the route configuration between end nodes and communication pattern(s) in the application. To make matters worse, the deterministic routing nature of InfiniBand limits the application from effective use of multiple paths transparently and avoid the hot-spots in the network. Simulation based studies for switches and adapters to implement congestion control have been proposed in the literature. However, these studies have focussed on providing congestion control for the communication path, and not on utilizing multiple paths in the network for hot-spot avoidance. In this paper, we design an MPI functionality, which provides hot-spot avoidance for different communications, without a priori knowledge of the pattern. We leverage LMC (LID mask count) mechanism of InfiniBand to create multiple paths in the network and present the design issues (scheduling policies, selecting number of paths, scalability aspects) of our design. We implement our design and evaluate it with Pallas collective communication and MPI applications. On an InfiniBand cluster with 48 processes, MPI All-to-all personalized shows an improvement of 27%. Our evaluation with NAS parallel benchmarks on 64 processes shows significant improvement in execution time with this functionality.

Lecture Notes in Computer Science | 2006

Efficient shared memory and RDMA based design for MPI_Allgather over infiniband

Amith R. Mamidala; Abhinav Vishnu; Dhabaleswar K. Panda

MPI_Allgather is an important collective operation which is used in applications such as matrix multiplication and in basic linear algebra operations. With the next generation systems going multi-core, the clusters deployed would enable a high process count per node. The traditional implementations of Allgather use two separate channels, namely network channel for communication across the nodes and shared memory channel for intra-node communication. An important drawback of this approach is the lack of sharing of communication buffers across these channels. This results in extra copying of data within a node yielding sub-optimal performance. This is true especially for a collective involving large number of processes with a high process density per node. In the approach proposed in the paper, we propose a solution which eliminates the extra copy costs by sharing the communication buffers for both intra and inter node communication. Further, we optimize the performance by allowing overlap of network operations with intra-node shared memory copies. On a 32, 2-way node cluster, we observe an improvement upto a factor of two for MPI_Allgather compared to the original implementation. Also, we observe overlap benefits upto 43% for 32x2 process configuration.

international parallel and distributed processing symposium | 2011

Iso-Energy-Efficiency: An Approach to Power-Constrained Parallel Computation

Shuaiwen Song; Chun-Yi Su; Rong Ge; Abhinav Vishnu; Kirk W. Cameron

Future large scale high performance supercomputer systems require high energy efficiency to achieve exaflops computational power and beyond. Despite the need to understand energy efficiency in high-performance systems, there are few techniques to evaluate energy efficiency at scale. In this paper, we propose a system-level iso-energy-efficiency model to analyze, evaluate and predict energy-performance of data intensive parallel applications with various execution patterns running on large scale power-aware clusters. Our analytical model can help users explore the effects of machine and application dependent characteristics on system energy efficiency and isolate efficient ways to scale system parameters (e.g. processor count, CPU power/frequency, workload size and network bandwidth) to balance energy use and performance. We derive our iso-energy-efficiency model and apply it to the NAS Parallel Benchmarks on two power-aware clusters. Our results indicate that the model accurately predicts total system energy consumption within 5\% error on average for parallel applications with various execution and communication patterns. We demonstrate effective use of the model for various application contexts and in scalability decision-making.

Computer Science - Research and Development | 2011

Mapping communication layouts to network hardware characteristics on massive-scale blue gene systems

Pavan Balaji; Rinku Gupta; Abhinav Vishnu; Peter H. Beckman

For parallel applications running on high-end computing systems, which processes of an application get launched on which processing cores is typically determined at application launch time without any information about the application characteristics. As high-end computing systems continue to grow in scale, however, this approach is becoming increasingly infeasible for achieving the best performance. For example, for systems such as IBM Blue Gene and Cray XT that rely on flat 3D torus networks, process communication often involves network sharing, even for highly scalable applications. This causes the overall application performance to depend heavily on how processes are mapped on the network. In this paper, we first analyze the impact of different process mappings on application performance on a massive Blue Gene/P system. Then, we match this analysis with application communication patterns that we allow applications to describe prior to being launched. The underlying process management system can use this combined information in conjunction with the hardware characteristics of the system to determine the best mapping for the application. Our experiments study the performance of different communication patterns, including 2D and 3D nearest-neighbor communication and structured Cartesian grid communication. Our studies, that scale up to 131,072 cores of the largest BG/P system in the United States (using 80% of the total system size), demonstrate that different process mappings can show significant difference in overall performance, especially on scale. For example, we show that this difference can be as much as 30% for P3DFFT and up to twofold for HALO. Through our proposed model, however, such differences in performance can be avoided so that the best possible performance is always achieved.

international parallel and distributed processing symposium | 2005

Performance modeling of subnet management on fat tree InfiniBand networks using OpenSM

Abhinav Vishnu; Amith R. Mamidala; Hyun-Wook Jin; Dhabaleswar K. Panda

InfiniBand is becoming increasingly popular in the area of cluster computing due to its open standard and high performance. Fat tree is a primary interconnection topology for building large scale InfiniBand clusters. Instead of using a shared bus approach, InfiniBand employs an arbitrary switched point-to-point topology. In order to manage the subnet, InfiniBand specifies a basic management infrastructure responsible for discovery, configuration and maintaining the active state of the network. In the literature, simulation studies have been done on irregular topologies to characterize the subnet management mechanism. However, there is no study to model subnet management mechanism on regular topologies using actual implementations. In this paper, we take up the challenge of modeling subnet management mechanism for fat tree InfiniBand networks using a popular subnet manager OpenSM. We present the timings for various subnet management phases namely topology discovery, path computation and path distribution for large scale fat tree InfiniBand subnets and present basic performance evaluation on small scale InfiniBand cluster. We verify our model with the basic set of results obtained, and present the results for the model by varying different parameters on fat trees.

Explore More