Jaspal Subhlok | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jaspal Subhlok is active.

Explore More

Publication

Featured researches published by Jaspal Subhlok.

acm sigplan symposium on principles and practice of parallel programming | 1993

Exploiting task and data parallelism on a multicomputer

Jaspal Subhlok; James M. Stichnoth; David R. O'Hallaron; Thomas R. Gross

For many applications, achieving good performance on a private memory parallel computer requires exploiting data parallelism as well as task parallelism. Depending on the size of the input data set and the number of nodes (i.e., processors), different tradeoffs between task and data parallelism are appropriate for a parallel system. Most existing compilers focus on only one of data parallelism and task parallelism. Therefore, to achieve the desired results, the programmer must separately program the data and task parallelism. We have taken a unified approach to exploiting both kinds of parallelism in a single framework with an existing language. This approach eases the task of programming and exposes the tradeoffs between data and task parallelism to compiler. We have implemented a parallelizing Fortran compiler for the iWarp system based on this approach. We discuss the design of our compiler, and present performance results to validate our approach.

high performance distributed computing | 1998

A resource query interface for network-aware applications

Bruce Lowekamp; Nancy Miller; Dean Sutherland; Thomas R. Gross; Peter Steenkiste; Jaspal Subhlok

Networked systems provide a cost-effective platform for parallel computing, but the applications have to deal with the changing availability of computation and communication resources. Network-awareness is a recent attempt to bridge the gap between the realities of networks and the demands of applications. Network-aware applications obtain information about their execution environment and dynamically adapt to enhance their performance. Adaptation is especially important for synchronous parallel applications because a single busy communication link can become the bottleneck and degrade overall performance dramatically. This paper presents Remos, a uniform API that allows applications to obtain relevant network information, and reports on the development of parallel applications in this environment. The challenges in defining a uniform interface include network heterogeneity, diversity and variability in network traffic, and resource sharing in the network and even inside an application. The first implementation of the Remos interface uses SNMP to monitor IP-based networks. This paper reports on our methodology for developing adaptive parallel applications for high-speed networks with Remos and presents experimental results using applications generated by the Fx parallelizing compiler. The results highlight the importance and effectiveness of adaptive parallel computing.

IEEE Parallel & Distributed Technology: Systems & Applications | 1994

Task Parallelism in a High Performance Fortran Framework

Thomas R. Gross; David R. O'Hallaron; Jaspal Subhlok

Exploiting both data and task parallelism in a single framework is the key to achieving good performance for a variety of applications.

acm sigplan symposium on principles and practice of parallel programming | 1995

Optimal mapping of sequences of data parallel tasks

Jaspal Subhlok; Gary L. Vondran

Many applications in a variety of domains including digital signal processing, image processing and computer vision are composed of a sequence of tasks that act on a stream of input data sets in a pipelined manner. Recent research has established that these applications are best mapped to a massively parallel machine by dividing the tasks into modules and assigning a subset of the available processors to each module. This paper addresses the problem of optimally mapping such applications onto a massively parallel machine. We formulate the problem of optimizing throughput in task pipelines and present two new solution algorithms. The formulation uses a general and realistic model for inter-task communication, takes memory constraints into account, and addresses the entire problem of mapping which includes clustering tasks into modules, assignment of processors to modules, and possible replication of modules. The first algorithm is based on dynamic programming and finds the optimal mapping of k tasks onto P processors in O(P4k2) time. We also present a heuristic algorithm that is linear in the number of processors and establish with theoretical and practical results that the solutions obtained are optimal in practical situations. The entire framework is implemented as an automatic mapping tool for the Fx parallelizing compiler for High Performance Fortran. We present experimental results that demonstrate the importance of choosing a good mapping and show that the methods presented yield efficient mappings and predict optimal performance accurately.

acm sigplan symposium on principles and practice of parallel programming | 1997

A new model for integrated nested task and data parallel programming

Jaspal Subhlok; Bwolen Yang

High Performance Fortran (HPF) has emerged as a standard language fordata parallel computing. However, a wide variety of scientific applications are best programmed by a combination of task and data parallelism. Therefore, a good model of task parallelism is important for continued success of HPF for parallel programming. This paper presents a task parallelism model that is simple, elegant, and relatively easy to implement in an HPF environment. Task parallelism is exploited by mechanisms for dividing processors into subgroups and mapping computations and data onto processor subgroups. This model of task parallelism has been implemented in the Fx compiler at Carnegie Mellon University. The paper addresses the main issues in compiling integrated task and data parallel programs and reports on the use of this model for programming various flat and nested task structures. Performance results are presented for a set of programs spanning signal processing, image processing, computer vision and environment modeling. A variant of this task model is a new approved extension of HPF and this paper offers insight into the power of expression and ease of implementation of this extension.

conference on high performance computing (supercomputing) | 1994

Communication and memory requirements as the basis for mapping task and data parallel programs

Jaspal Subhlok; David R. O'Hallaron; Thomas R. Gross; Peter A. Dinda; Jon A. Webb

For a wide variety of applications, both task and data parallelism must be exploited to achieve the best possible performance on a multicomputer. Recent research has underlined the importance of exploiting task and data parallelism in a single compiler framework, and such a compiler can map a single source program in many different ways onto a parallel machine. The tradeoffs between task and data parallelism are complex and depend on the characteristics of the program to be executed, most significantly the memory and communication requirements, and the performance parameters of the target parallel machine. We present a framework to isolate and examine the specific characteristics of programs that determine the performance for different mappings. Our focus is on applications that process a stream of input, and whose computation structure is fairly static and predictable. We describe three such applications that were developed with our compiler: fast Fourier transforms, narrowband tracking radar; and multibaseline stereo. We examine the tradeoffs between various mappings for them and show how the framework is used to obtain efficient mappings.<<ETX>>

Proceedings. Eighth Heterogeneous Computing Workshop (HCW'99) | 1999

Adaptive distributed applications on heterogeneous networks

Thomas R. Gross; Peter Steenkiste; Jaspal Subhlok

Distributed applications execute in environments that can include different network architectures as well as a range of compute platforms. Furthermore, these resources are shared by many users. Therefore these applications receive varying levels of service from the environment. Since the availability of resources in a networked environment often determines overall application performance, adaptivity is necessary for efficient execution and predictable response time. However, heterogeneous systems pose many challenges for adaptive applications. We discuss the range of situations that can benefit from adaptivity in the context of a set of system and environment parameters. Adaptive applications require information about the status of the execution environment and heterogeneous environments call for a portable system to provide such information. We discuss Remos (Resource Monitoring System), a system that allows applications to collect information about network and host conditions across different network architectures. Finally, we report our experience and performance results from a set of adaptive versions of Airshed pollution modeling application executing on a networking testbed.

conference on high performance computing (supercomputing) | 1996

Impact of Job Mix on Optimizations for Space Sharing Schedulers

Jaspal Subhlok; Thomas R. Gross; Takashi Suzuoka

Abstract Modern parallel systems with N nodes can concurrently service multiple jobs requesting a total of up to to N nodes. One of the challenges for the operating system is to give reasonable service to a diverse group of jobs. Asequence of large jobs, each requiring over half of the available nodes, can reduce the machine utilization by up to 50%, but scheduling a long running job on the idle nodes may block the stream of large jobs. Various policies have been proposed for scheduling parallel computers, but as the users of current supercomputers know, these policies are far from perfect. This paper reports on the measurement of the usage of a 512-node IBM SP2 at Cornell Theory Center, a 96-node Intel Paragon at ETH Zurich, and a 512-node Cray T3D at Pittsburgh Supercomputing Center. We discuss the characteristics of the different workloads and examine their impact on job scheduling. We specifically show how two simple scheduling optimizations based on reordering the waiting queue can be used effectively to improve scheduling performance on real workloads. Supercomputer workloads from different installations exhibit some common characteristics, but they also differ in important ways We demonstrate how this knowledge can be exploited in the design and tuning of schedulers.

conference on high performance computing (supercomputing) | 1991

A new approach for automatic parallelization of blocked linear Algebra computations

H. T. Kung; Jaspal Subhlok

No abstract available

merged international parallel processing symposium and symposium on parallel and distributed processing | 1998

Airshed pollution modeling: a case study in application development in an HPF environment

Jaspal Subhlok; Peter Steenkiste; James M. Stichnoth; Peter Lieu

In this paper, we describe our experience with developing Airshed, a large pollution modeling application, in the Fx programming environment. We demonstrate that high level parallel programming languages like Fx and High Performance Fortran offer a simple and attractive model for developing portable and efficient parallel applications. Performance results are presented for the Airshed application executing on Intel Paragon and Cray T3D and T3E parallel computers. The results demonstrate that the application is performance portable, i.e., it achieves good and consistent performance across different architectures, and that the performance can be explained and predicted using a simple model for the communication and computation phases in the program. We also show how task parallelism was used to alleviate I/O related bottlenecks, an important consideration in many applications. Finally, we demonstrate how external parallel modules developed using different parallelization methods can be integrated in a relatively simple and flexible way with modules developed in the Fx compiler framework. Overall, our experience demonstrates that an HPF-based environment is highly suitable for developing complex applications, including multidisciplinary applications.

Explore More