Bill Nitzberg
Ames Research Center
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Bill Nitzberg.
job scheduling strategies for parallel processing | 1995
Dror G. Feitelson; Bill Nitzberg
Statistics of a parallel workload on a 128-node iPSC/860 located at NASA Ames are presented. It is shown that while the number of sequential jobs dominates the number of parallel jobs, most of the resources (measured in node-seconds) were consumed by parallel jobs. Moreover, most of the sequential jobs were for system administration. The average runtime of jobs grew with the number of nodes used, so the total resource requirements of large parallel jobs were larger by more than the number of nodes they used. The job submission rate during peak day activity was somewhat lower than one every two minutes, and the average job size was small. At night, submission rate was low but job sizes and system utilization were high, mainly due to NQS. Submission rate and utilization over the weekend were lower than on weekdays. The overall utilization was 50%, after accounting for downtime. About 2/3 of the applications were executed repeatedly, some for a significant number of times.
high performance distributed computing | 1999
William E. Johnston; Dennis Gannon; Bill Nitzberg
Information Power Grid (IPG) is the name of NASAs project to build a fully distributed computing and data management environment-a Grid. The IPG project has near, medium, and long-term goals that represent a continuum of engineering, development, and research topics. The overall goal is to provide the NASA scientific and engineering communities a substantial increase in their ability to solve problems that depend on use of large-scale and/or dispersed resources: aggregated computing, diverse data archives, laboratory instruments and engineering test facilities, and human collaborators. The approach involves infrastructure and services than can locate, aggregate, integrate, and manage resources from across the NASA enterprise. An important aspect of IPG is to produce a common view of these resources, and at the same time provide for distributed management and local control. In addition to addressing the overall goal of enhanced science and engineering, there is a potential important side effect. With a large collection of resources that have common use interfaces and a common management approach, the potential exists for a considerable pool of computing capability that could relatively easily, for example, be called on in extraordinary situations such as crisis response.
IEEE Transactions on Parallel and Distributed Systems | 1997
Virginia Mary Lo; Kurt J. Windisch; Wanqian Liu; Bill Nitzberg
Current processor allocation techniques for highly parallel systems are typically restricted to contiguous allocation strategies for which performance suffers significantly due to the inherent problem of fragmentation. As a result, message-passing systems have yet to achieve the high utilization levels exhibited by traditional vector supercomputers. We are investigating processor allocation algorithms which lift the restriction on contiguity of processors in order to address the problem of fragmentation. Three noncontiguous processor allocation strategies-paging allocation, random allocation, and the Multiple Buddy Strategy (MBS)-are proposed and studied in this paper. Simulations compare the performance of the noncontiguous strategies with that of several well-known contiguous algorithms. We show that noncontiguous allocation algorithms perform better overall than the contiguous ones, even when message-passing contention is considered. We also present the results of experiments on an Intel Paragon XP/S-15 with 208 nodes that show noncontiguous allocation is feasible with current technologies.
job scheduling strategies for parallel processing | 1999
James Patton Jones; Bill Nitzberg
The NAS facility has operated parallel supercomputers for the past 11 years, including the Intel iPSC/860, Intel Paragon, Thinking Machines CM-5, IBM SP-2, and Cray Origin 2000. Across this wide variety of machine architectures, across a span of 10 years, across a large number of different users, and through thousands of minor configuration and policy changes, the utilization of these machines shows three general trends: (1) scheduling using a naive FCFS first-fit policy results in 40-60% utilization, (2) switching to the more sophisticated dynamic backfilling scheduling algorithm improves utilization by about 15 percentage points (yielding about 70% utilization), and (3) reducing the maximum allowable job size further increases utilization. Most surprising is the consistency of these trends. Over the lifetime of the NAS parallel systems, we made hundreds, perhaps thousands, of small changes to hardware, software, and policy, yet utilization was affected little. In particular, these results show that the goal of achieving near 100% utilization while supporting a real parallel supercomputing workload is unrealistic.
Scientific Programming | 2002
Jennifer M. Schopf; Bill Nitzberg
The design and implementation of a national computing system and data grid has become a reachable goal from both the computer science and computational science point of view. A distributed infrastructure capable of sophisticated computational functions can bring many benefits to scientific work, but poses many challenges, both technical and socio-political. Technical challenges include having basic software tools, higher-level services, functioning and pervasive security, and standards, while socio-political issues include building a user community, adding incentives for sites to be part of a user-centric environment, and educating funding sources about the needs of this community. This paper details the areas relating to Grid research that we feel still need to be addressed to fully leverage the advantages of the Grid.
symposium on frontiers of massively parallel computation | 1996
Kurt J. Windisch; Virginia Mary Lo; R. Moore; D. Feitelson; Bill Nitzberg
The analysis of workload traces from real production parallel machines can aid a wide variety of parallel processing research, providing a realistic basis for experimentation in the management of resources over an entire workload. We analyze a five-month workload trace of an Intel Paragon machine supporting a production parallel workload at the San Diego Supercomputer Center (SDSC), comparing and contrasting it with a similar workload study of an Intel iPSC/860 machine at NASA Ames NAS. Our analysis of workload characteristics takes into account the job scheduling policies of the sites and focuses on characteristics such as job size distribution (job parallelism), resource usage, runtimes, submission patterns, and wait times. Despite fundamental differences in the two machines and their respective usage environments, we observe a number of interesting similarities with respect to job size distribution, system utilization, runtime distribution, and interarrival time distribution. We hope to gain insight into the potential use of workload traces for evaluating resource management policies at supercomputing sites and for providing both real-world job streams and accurate stochastic workload models for use in simulation analysis of resource management policies.
International Journal of Parallel Programming | 1991
Virginia Mary Lo; Sanjay V. Rajopadhye; Samik Gupta; David Keldsen; Moataz A. Mohamed; Bill Nitzberg; Jan Arne Telle; Xiaoxiong Zhong
The OREGAMI project involves the design, implementation, and testing of algorithms for mapping parallel computations to message-passing parallel architectures. OREGAMI addresses the mapping problem by exploiting regularity and by allowing the user to guide and evaluate mapping decisions made by OREGAMIs efficient combinatorial mapping algorithms. OREGAMIs approach to mapping is based on a new graph theoretic model of parallel computation called the Temporal Communication Graph. The OREGAMI software tools include three components: (1) LaRCS is a graph description language which allows the user to describe regularity in the communication topology as well as the temporal communication behavior (the pattern of message-passing over time). (2) MAPPER is our library of mapping algorithms which utilize information provided by LaRCS to perform contraction, embedding, and routing. (3) METRICS is an interactive graphics tool for display and analysis of mappings. This paper gives an overview of the OREGAMI project, the software tools, and OREGAMIs mapping algorithms.
conference on high performance computing (supercomputing) | 1994
Wanqian Liu; Virginia Mary Lo; Kurt J. Windisch; Bill Nitzberg
Current processor allocation techniques for highly parallel systems have thus far been restricted to contiguous allocation strategies for which performance suffers significantly due to the inherent problem of fragmentation. We are investigating processor allocation algorithms which lift the restriction on contiguity of processors in order to address the problem of fragmentation. Three non-contiguous processor allocation strategies: naive, random and the multiple buddy strategy (MBS) are proposed and studied in this paper. Simulations compare the performance of the non-contiguous strategies with that of several well-known contiguous algorithms. We show that non-contiguous allocation algorithms perform better overall than the contiguous ones, even when message-passing contention is considered. We also present the results of experiments on an Intel Paragon XP/S-15 with 208 nodes that show non-contiguous allocation is feasible with current technologies.<<ETX>>
international parallel processing symposium | 1993
John Krystynak; Bill Nitzberg
Typical scientific applications require vast amounts of processing power coupled with significant I/O capacity. Highly parallel computer systems can provide processing power at low cost, but have historically lacked I/O capacity. By evaluating the performance and scalability of the Intel iPSC/860 Concurrent File System and the Connection Machine DataVault, one can get an idea of the current state of parallel I/O performance. The performance tests show that both systems are able to achieve 70% of peak I/O throughput.<<ETX>>
job scheduling strategies for parallel processing | 2002
Bill Nitzberg; Jennifer M. Schopf
The Global Grid Forums Scheduling and Resource Management Area is actively pursuing the standards that are needed for interoperability of Grid resource management systems. This includes work in defining architectures, language standards, APIs and protocols. In this article we overview the state of the working groups and research groups in the area as of September 2002.