John Zahorjan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where John Zahorjan is active.

Explore More

Publication

Featured researches published by John Zahorjan.

measurement and modeling of computer systems | 1990

Processor scheduling in shared memory multiprocessors

John Zahorjan; Cathy McCann

Existing work indicates that the commonly used “single queue of runnable tasks” approach to scheduling shared memory multiprocessors can perform very poorly in a multiprogrammed parallel processing environment. A more promising approach is the class of “two-level schedulers” in which the operating system deals solely with allocating processors to jobs while the individual jobs themselves perform task dispatching on those processors. In this paper we compare two basic varieties of two-level schedulers. Those of the first type, static, make a single decision per job regarding the number of processors to allocate to it. Once the job has received its allocation, it is guaranteed to have exactly that number of processors available to it whenever it is active. The other class of two-level scheduler, dynamic, allows each job to acquire and release processors during its execution. By responding to the varying parallelism of the jobs, the dynamic scheduler promises higher processor utilizations at the cost of potentially greater scheduling overhead and more complicated application level task control policies. Our results, obtained via simulation, highlight the tradeoffs between the static and dynamic approaches. We investigate how the choice of policy is affected by the cost of switching a processor from one job to another. We show that for a wide range of plausible overhead values, dynamic scheduling is superior to static scheduling. Within the class of static schedulers, we show that, in most cases, a simple “run to completion” scheme is preferable to a round-robin approach. Finally, we investigate different techniques for tuning the allocation decisions required by the dynamic policies and quantify their effects on performance. We believe our results are directly applicable to many existing shared memory parallel computers, which for the most part currently employ a simple “single queue of tasks” extension of basic sequential machine schedulers. We plan to validate our results in future work through implementation and experimentation on such a system.

acm sigplan symposium on principles and practice of parallel programming | 1993

Improving the performance of runtime parallelization

Shun tak Leung; John Zahorjan

When the inter-iteration dependency pattern of the iterations of a loop cannot be determined statically, compile time parallelization of the loop is not possible. In these cases, runtime parallelization [8] is the only alternative. The idea is to transform the loop into two code fragements: the inspector and the executor. When the program is run, the inspector examines the iteration dependencies and constructs a parallel schedule. The executor subsequently uses that schedule to carry out the actual computation in parallel. In this paper, we show how to reduce the overhead of running the inspector through its parallel execution. We describe two related approaches. The first, which emphasizes inspector efficiency, achieves nearly linear speedup relative to a sequential execution of the inspector, but produces a schedule that may be less efficient for the executor. The second technique, which emphasizes executor efficiency, does not in general achieve linear speedup of the inspector, but is guaranteed to produce the best achievable schedule. We present these techniques, show that they are correct, and compare their performance to existing techniques using a set of experiments. Because in this paper we are optimizing inspector time, but leaving the executor unchanged, the techniques we present have most dramatic effect when the inspector must be run for each invocation of the source loop. In a companion paper [3], we explore techniques that build upon those developed here to also improve executor performance.

measurement and modeling of computer systems | 1994

Processor allocation policies for message-passing parallel computers

Cathy McCann; John Zahorjan

When multiple jobs compete for processing resources on a parallel computer, the operating system kernels processor allocation policy determines how many and which processors to allocate to each. In this paper we investigate the issues involved in constructing a processor allocation policy for large scale, message-passing parallel computers supporting a scientific workload. We make four specific contributions: We define the concept of efficiency preservation as a characteristic of processor allocation policies. Efficiency preservation is the degree to which the decisions of the processor allocator degrade the processor efficiencies experienced by individual applications relative to their efficiencies when run alone. We identify the interplay between the kernel processor allocation policy and the application load distribution policy as a determinant of efficiency preservation. We specify the details of two families of processor allocation policies, called Equipartition and Folding. Within each family, different member policies cover a range of efficiency preservation values, from very high to very low. By comparing policies within each family as well as between families, we show that high efficiency preservation is essential to good performance, and that efficiency preservation is a more dominant factor in obtaining good performance than is equality of resource allocation.

job scheduling strategies for parallel processing | 1996

Using Runtime Measured Workload Characteristics in Parallel Processor Scheduling

Thu D. Nguyen; Raj Vaswani; John Zahorjan

We consider the use of runtime measured workload characteristics in parallel processor scheduling. Although many researchers have considered the use of application characteristics in this domain, most of this work has assumed that such information is available a priori. In contrast, we propose and evaluate experimentally dynamic processor allocation policies that rely on determining job characteristics at runtime; in particular, we focus on measuring and using job efficiency and speedup.

measurement and modeling of computer systems | 1995

Scheduling memory constrained jobs on distributed memory parallel computers

Cathy McCann; John Zahorjan

We consider the problem of multiprocessor scheduling of jobs whose memory requirements place lower bounds on the fraction of the machine required in order to execute. We address three primary questions in this work:1. How can a parallel machine be multiprogrammed with minimal overhead when jobs have minimum memory requirements?2. To what extent does the inability of an application to repartition its workload during runtime affect the choice of processor allocation policy?3. How rigid should the system be in attempting to provide equal resource allocation to each runnable job in order to minimize average response time?This work is applicable both to parallel machines and to networks of workstations supporting parallel applications.

international conference on parallel processing | 1996

Maximizing speedup through self-tuning of processor allocation

Thu D. Nguyen; Raj Vaswani; John Zahorjan

Addresses the problem of maximizing application speedup through run-time self-selection of an appropriate number of processors on which to run. Automatic run-time selection of processor allocations is important because many parallel applications exhibit peak speedups at allocations that are data- or time-dependent. We propose the use of a run-time system that: (a) dynamically measures job efficiencies at different allocations, (b) uses these measurements to calculate speedups, and (c) automatically adjusts a jobs processor allocation to maximize its speedup. Using a set of 10 applications that includes both hand-coded parallel programs and compiler-parallelized sequential programs, we show that our run-time system can reliably determine dynamic allocations that match the best possible static allocation, and that it has the potential to find dynamic allocations that outperform any static allocation.

Computer Physics Communications | 1997

Dynamic-domain-decomposition parallel molecular dynamics

S. G. Srinivasan; Immaneni Ashok; Hannes Jónsson; Gretchen Kalonji; John Zahorjan

Abstract Parallel molecular dynamics with short-range forces can suffer from load-imbalance problems and attendant performance degradation due to density variations in the simulated system. In this paper, we describe an approach to dynamical load balancing, enabled by the Ādhāra runtime system. The domain assigned to each processor is automatically and dynamically resized so as to evenly distribute the molecular dynamics computations across all the processors. The algorithm was tested on an Intel Paragon parallel computer for two and three-dimensional Lennard-Jones systems containing 99 458 and 256000 atoms, respectively, and using up to 256 processors. In these benchmarks, the overhead for carrying out the load-balancing operations was found to be small and the total computation time was reduced by as much as 50%.

conference on high performance computing (supercomputing) | 1992

Scheduling a mixed interactive and batch workload on a parallel, shared memory supercomputer

Immaneni Ashok; John Zahorjan

The authors analyze three approaches to scheduling mixed batch and interactive workloads on a supercomputer: (i) fixed partition, in which memory resources are statically allocated between the workloads: (ii) no partition, in which the interactive workload preempts resources as needed from the batch workload, and (iii) no partition with grouped admission, in which the interactive workload preempts resources only when the number of waiting interactive jobs reaches a threshold value. The authors also investigate the potential benefits of using virtual memory to perform the automatic overlay of jobs too large to fit in the amount of real memory instantaneously available to them. Using analytic tools, they compare the different policies according to the average speedup achieved by the batch workload given that a mean interactive job response time objective must be met by each. They show that, under a wide variety of conditions, fixed partition performs better than the other policies.<<ETX>>

ieee international conference on high performance computing data and analytics | 1994

Adhara: runtime support for dynamic space-based applications on distributed memory MIMD multiprocessors

Immaneni Ashok; John Zahorjan

We describe Adhara, a runtime system specialized for dynamic space-based applications, such as particle-in-cell simulations, molecular dynamics problems and adaptive grid simulations. Adhara facilitates the programming of such applications by supporting spatial data structures (e.g., grids and particles), and facilitates obtaining good performance by performing automatic data partitioning and dynamic load balancing. We demonstrate the effectiveness of Adhera by efficiently parallelizing a specific plasma physics application. The development of the parallel program involved the addition of very few lines of code beyond those required to develop a sequential version of the application, and executed at 90% efficiency on 16 nodes of an Intel Paragon.<<ETX>>

symposium on operating systems principles | 1991