Sharad K. Garg | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sharad K. Garg is active.

Explore More

Publication

Featured researches published by Sharad K. Garg.

international parallel and distributed processing symposium | 2000

Job scheduling that minimizes network contention due to both communication and I/O

Jens Mache; Virginia Mary Lo; Sharad K. Garg

As communication and I/O traffic increase on the interconnection network of high-performance systems, network contention becomes a critical problem drastically reducing performance. Whereas earlier allocation strategies were either sensitive to communication alone or sensitive to I/O alone, we present a new strategy that is sensitive to both communication and I/O. Our new strategy MC-Elongated, strives to achieve (1) the compactness needed to minimize communication-based contention as well as (2) the balance and orientation relative to I/O nodes needed to minimize I/O-based contention. We tested our new strategy using synthetic workloads and a real workload trace of 6087 jobs captured from a 400 node Intel Paragon. Our results show that with respect to system throughput and average job turnaround time, in environments with varying degree of communication and I/O traffic, MC-Elongated outperforms previous allocation strategies that are in use today. Regarding the tension between communication and I/O, our results show that spatial layout is more critical for I/O intensive jobs at lower utilization levels and more critical for communication-intensive jobs at higher utilization levels; and that in general, the impact of I/O traffic is dominant.

conference on high performance computing (supercomputing) | 1998

TFLOPS PFS: Architecture and Design of a Highly Efficient Parallel File System

Sharad K. Garg

In recent years, many commercial Massively Parallel Processor (MPP) systems have been available available to the computing community. These systems provide very high processing power (up to hundreds of GFLOPs), and can scale efficiently with the number of processors. However, many scientific and commercial applications that run on these multiprocessors may not experience significant benefit in terms of speedup and are bottlenecked by their I/O requirements. Although these multiprocessors may be configured with sufficient I/O hardware, the file system software often fails to provide the available I/O bandwidth to the application, and causes severe performance performance degradation for I/O intensive applications. A highly efficient parallel file system has been implemented on Intels Teraflops (TFLOPS) machine and provides a sustained I/O bandwidth of 1 GB/sec. This file system provides almost 95% of the available raw hardware I/O bandwidth and the I/O bandwidth scales proportional to the available I/O nodes. Intels TFLOPS machine is the first Accelerated Strategic Computing Initiative (ASCI) machine that DOE has acquired. This computer is 10 times more powerful than the fastest machine today, and will be used primarily to simulate nuclear testing and to ensure the safety and effectiveness of the nations nuclear weapons stockpile. This machine contains over 9000 Intels Pentium Pro processors, and will provide a peak CPU performance of 1.8 teraflops. This papers presents the I/O design and architecture of Intels TFLOPS supercomputer, describes the Cougar OS I/O and its interface with the Intels Parallel File System.

Journal of Parallel and Distributed Computing | 2005

The impact of spatial layout of jobs on I/O hotspots in mesh networks

Jens Mache; Virginia Mary Lo; Sharad K. Garg

Network contention hotspots can limit network throughput for parallel disk I/O, even when the interconnection network appears to be sufficiently provisioned. We studied I/O hotspots in mesh networks as a function of the spatial layout of an applications compute nodes relative to the I/O nodes. Our analytical modeling and dynamic simulations show that when I/O nodes are configured on one side of a two-dimensional mesh, realizable I/O throughput is at best bounded by four times the network bandwidth per link. Maximal performance depends on the spatial layout of jobs, and cannot be further improved by adding I/O nodes. Applying these results, we devised a new parallel layout allocation strategy (PLAS) which minimizes I/O hotspots, and approaches the theoretical best case for parallel I/O throughput. Our I/O performance analysis and processor allocation strategy are applicable to a wide range of contemporary and emerging high-performance computing systems.

foundations of computer science | 2001

Performance evaluation of parallel file systems for PC clusters and ASCI red

Sharad K. Garg; Jens Mache

Parallel file systems provide high performance disk access, which is crucial for many scientific and commercial applications.In this paper, we explore the current state of the art of parallel file systems for PC clusters.To do so, we evaluate the I/O performance of Intel PFS, a commercial file system for ASCI Red, and PVFS, an open-source file system for Linux clusters.Our study shows that parallel file systems for PC clusters have come a long way.While there is still room for improvements, high performance disk access on PC clusters is becoming a reality.Three and a half years after I/O throughputs of more than 1 Gigabyte/sec have been achieved for the first time (with Intel PFS on ASCI Red), PVFS can now deliver this level of I/O performance on PC clusters.

workshop on i/o in parallel and distributed systems | 1999

The impact of spatial layout of jobs on parallel I/O performance

Jens Mache; Virginia Mary Lo; Marilynn Livingston; Sharad K. Garg

Input/Output is a big obstacle to effective use of tenflopsscale computing systems, Motivated by earlier parallel I/O meaurements on an Intel TFLOPS machine, we conduct studies to determine the sensitivity of parallel I/O performance on multi-progmmmed mesh-connected machines with respect to number of I/O nodes, number of compute nodes, network link bandwidth, I/O node bandwidth, spatial layout of jobs, and read or write demands of applications. Our extensive simulations and analytical modeling yield important insights into the limitations on parallel I/O performance due to network contention, and into the possible gains in parallel I/O performance that can be achieved by tuning the spatial layout of jobs. Applying these results, we devise a new processor allocation strategy that is sensitive to parallel I/O traffic and the resulting network contention. In performance evaluations driven by synthetic workloads and by a real workload trace captured at the San Diego Supercomputing Center, the new strategy improves the average response time of parallel I/O intensive jobs by up to a factor of 4.5.

Archive | 2003