Gaurav Khanna | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Gaurav Khanna is active.

Explore More

Publication

Featured researches published by Gaurav Khanna.

cluster computing and the grid | 2005

A hypergraph partitioning based approach for scheduling of tasks with batch-shared I/O

Gaurav Khanna; Nagavijayalakshmi Vydyanathan; Tahsin M. Kurç; Pete Wyckoff; Joel H. Saltz; P. Sadayappan

This paper proposes a novel, hypergraph partitioning based strategy to schedule multiple data analysis tasks with batch-shared I/O behavior. This strategy formulates the sharing of files among tasks as a hypergraph to minimize the I/O overheads due to transferring of the same set of files multiple times and employs a dynamic scheme for file transfers to reduce contention on the storage system. We experimentally evaluate the proposed approach using application emulators from two application domains; analysis of remotely-sensed data and biomedical imaging.

ieee international conference on high performance computing data and analytics | 2008

Using overlays for efficient data transfer over shared wide-area networks

Gaurav Khanna; Tahsin M. Kurç; Rajkumar Kettimuthu; P. Sadayappan; Ian T. Foster; Joel H. Saltz

Data-intensive applications frequently transfer large amounts of data over wide-area networks. The performance achieved in such settings can often be improved by routing data via intermediate nodes chosen to increase aggregate bandwidth. We explore the benefits of overlay network approaches by designing and implementing a service-oriented architecture that incorporates two key optimizations -- multi-hop path splitting and multi-pathing - within the GridFTP file transfer protocol. We develop a file transfer scheduling algorithm that incorporates the two optimizations in conjunction with the use of available file replicas. The algorithm makes use of information from past GridFTP transfers to estimate network bandwidths and resource availability. The effectiveness of these optimizations is evaluated using several application file transfer patterns: one-to-all broadcast, all-to-one gather, and data redistribution, on a wide-area testbed. The experimental results show that our architecture and algorithm achieve significant performance improvement.

international parallel and distributed processing symposium | 2008

A dynamic scheduling approach for coordinated wide-area data transfers using GridFTP

Gaurav Khanna; Tahsin M. Kurç; Rajkumar Kettimuthu; P. Sadayappan; Joel H. Saltz

Many scientific applications need to stage large volumes of files from one set of machines to another set of machines in a wide-area network. Efficient execution of such data transfers needs to take into account the heterogeneous nature of the environment and dynamic availability of shared resources. This paper proposes an algorithm that dynamically schedules a batch of data transfer requests with the goal of minimizing the overall transfer time. The proposed algorithm performs simultaneous transfer of chunks of files from multiple file replicas, if the replicas exist. Adaptive replica selection is employed to transfer different chunks of the same file by taking dynamically changing network band- widths into account. We utilize GridFTP as the underlying mechanism for data transfers. The algorithm makes use of information from past GridFTP transfers to estimate network bandwidths and resource availability. The efficiency of the algorithm is evaluated on a wide-area testbed.

high performance distributed computing | 2006

Task Scheduling and File Replication for Data-Intensive Jobs with Batch-shared I/O

Gaurav Khanna; Nagavijayalakshmi Vydyanathan; Tahsin M. Kurç; Sriram Krishnamoorthy; P. Sadayappan; Joel H. Saltz

This paper addresses the problem of efficient execution of a batch of data-intensive tasks with batch-shared I/O behavior, on coupled storage and compute clusters. Two scheduling schemes are proposed: 1) a 0-1 integer programming (IP) based approach, which couples task scheduling and data replication, and 2) a bi-level hypergraph partitioning based heuristic approach (BiPartition), which decouples task scheduling and data replication. The experimental results show that: 1) the IP scheme achieves the best batch execution time, but has significant scheduling overhead, thereby restricting its application to small scale workloads, and 2) the BiPartition scheme is a better fit for larger workloads and systems - it has very low scheduling overhead and no more than 5-10% degradation in solution quality, when compared with the IP based approach

european conference on parallel processing | 2007

Scheduling file transfers for data-intensive jobs on heterogeneous clusters

Gaurav Khanna; Tahsin M. Kurç; P. Sadayappan; Joel H. Saltz

This paper addresses the problem of efficient collective scheduling of file transfers requested by a batch of tasks. Our work targets a heterogeneous collection of storage and compute clusters. The goal is to minimize the overall time to transfer files to their respective destination nodes. Two scheduling schemes are proposed and experimentally evaluated against an existing approach, the Insertion Scheduling. The first is a 0-1 Integer Programming based approach which is based on the idea of time-expanded networks. This scheme achieves the minimum total file transfer time, but has significant scheduling overhead. To address this issue, we propose a maximum weight graph matching based heuristic approach. This scheme is able to perform as well as insertion scheduling and has much lower scheduling overhead. We conclude that the heuristic scheme is a better fit for larger workloads and systems.

job scheduling strategies for parallel processing | 2006

A data locality aware online scheduling approach for I/O-intensive jobs with file sharing

Gaurav Khanna; Tahsin M. Kurç; P. Sadayappan; Joel H. Saltz

Many scientific investigations have to deal with large amounts of data from simulations and experiments. Data analysis in such investigations typically involves extraction of subsets of data, followed by computations performed on extracted data. Scheduling in this context requires efficient utilization of the computational, storage and network resources to optimize response time. The data-intensive nature of such applications necessitates data-locality aware job scheduling algorithms. This paper proposes a hypergraph based dynamic scheduling heuristic for a stream of independent I/O intensive jobs with file sharing behavior. The proposed heuristic is based on an event-driven, run-time hypergraph modeling of the file sharing characteristics among jobs. Our experiments on a coupled compute/storage cluster show it performs better compared to previously proposed strategies, under a varying set of parameters for workloads from the application domain of biomedical image analysis.

high performance distributed computing | 2008

Multi-hop path splitting and multi-pathing optimizations for data transfers over shared wide-area networks using gridFTP

Gaurav Khanna; Tahsin M. Kurç; P. Sadayappan; Joel H. Saltz; Rajkumar Kettimuthu; Ian T. Foster

In this paper, we propose to employ two optimizations - multi-hop path splitting and multi-pathing - to improve the performance of data transfers over shared public networks. We present a path determination algorithm which integrates the aforesaid optimizations in order to improve the performance of single file transfers. Finally, we develop a file transfer scheduling algorithm based on this framework, and evaluate its effectiveness on a wide-area testbed.

grid computing | 2004

Use of PVFS for efficient execution of jobs with pipeline-shared I/O

Nagavijayalakshmi Vydyanathan; Gaurav Khanna; Tahsin M. Kurç; Pete Wyckoff; Joel H. Saltz; P. Sadayappan

This paper is concerned with efficient execution of applications that are composed of chain of sequential data processes, which exchange data through a file system. We focus on pipeline-shared I/O behavior within a single pipeline of processes running on a cluster. We examine several scheduling strategies and experimentally evaluate them for efficient use the parallel virtual file system (PVFS) as a common storage pool.

international conference on networks | 2007