Stephen D. Kleban | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Stephen D. Kleban is active.

Explore More

Publication

Featured researches published by Stephen D. Kleban.

cluster computing and the grid | 2003

Fair share on high performance computing systems: what does fair really mean?

Stephen D. Kleban

We report on a performance evaluation of a Fair Share system at the ASCI Blue Mountain supercomputer cluster. We study the impacts of share allocation under Fair Share on wait times and expansion factor. We also measure the Service Ratio, a typical figure of merit for Fair Share systems, with respect to a number of job parameters. We conclude that Fair Share does little to alter important performance metrics such as expansion factor. This leads to the question of what Fair Share means on cluster machines. The essential difference between Fair Share on a uni-processor and a cluster is that the workload on a cluster is not fungible in space or time. We find that cluster machines must be highly utilized and support checkpointing in order for Fair Share to function more closely to the spirit in which it was originally developed.

conference on high performance computing (supercomputing) | 2003

Hierarchical Dynamics, Interarrival Times, and Performance

Stephen D. Kleban

We report on a model of the distribution of job submission interarrival times in supercomputers. Interarrival times are modeled as a consequence of a complicated set of decisions between users, the queuing algorithm, and other policies. This cascading hierarchy of decision-making processes leads to a particular kind of heavy-tailed distribution. Specifically, hierarchically constrained systems suggest that fatter tails are due to more levels coming into play in the overall decision-making process. The key contribution of this paper is that heavier tials resulting from more complex decision-making processes, that is more hierarchical levels, will lead to overall worse performance, even when the average interarrival time is the same. Finally, we offer some suggestions for how to overcome these issues and the tradeoffs involved.

international parallel and distributed processing symposium | 2004

Computation-at-risk: assessing job portfolio management risk on clusters

Stephen D. Kleban

Summary form only given. In this paper we introduce the concept of computation-at-risk, CaR, a methodology, procedure, and quantity of computational risk and reward resulting from running a particular portfolio of jobs on a cluster under a specific queue policy. Modeled after value-at-risk, VaR, from the financial community, CaR introduces the new element of computational risk into the management of a computational cluster. Specifically, administrators of clusters and other large-scale computing systems must deal with a wide range of job sizes, often up to eight orders of magnitude in the number of cycles. Such a job portfolio has implicit risks and rewards to performance both for certain types of jobs and to the facility overall. In this paper we quantify the risk and reward in terms of makespan and expansion factor. We assess the risk/reward profile for two categories of job portfolios, one with respect to queue settings and the other in terms of job sizes. These assessments provide a means for evaluating which queue policies or job sizes have the best risk/reward characteristics in terms of performance. We found that looser constraints on queue policy in the form run-time limits were beneficial from a risk/reward and CaR perspective. This information can be used by administrators to modify queue policy and by users to tailor the size of their jobs.

international conference on cluster computing | 2004

Computation-at-risk: employing the grid for computational risk management

Stephen D. Kleban

This work expands upon our earlier work involving the concept of computation-at-risk (CaR). In particular, CaR refers to the risk that certain computations may not get done within a timely manner. We examine a number of CaR distributions on several large clusters. The important contribution of This work is that it shows that there exist CaR-reducing strategies and by employing such strategies, a facility can significantly reduce the risk of inefficient resource utilization. Grids are shown to be one means for employing a CaR-reducing strategy. For example, we show that a CaR-reducing strategy applied to a common queue can have a dramatic effect on the wait times for jobs on a grid of clusters. In particular, we defined a CaR Sharpe rule that provides a decision rule for determining the best machine in a grid to place a new job.

designing for user experiences | 2003

GMS: preserving multiple expert voices in scientific knowledge management

Adria Hope Liszka; William A. Stubblefield; Stephen D. Kleban

Computer archives of scientific and engineering knowledge must insure the accuracy, completeness, and validity of their contents. Unfortunately, designers of these sites often overlook the social and cognitive context of scientific activity in favor of highly distilled collections of theoretical findings and technical data, divorcing scientific information from its human origins.Contextual aspects of knowledge seldom find their way into journals and other scientific forums, yet they often reveal the broader strategies behind the development and application of that knowledge. In implementing a GMS (Glass-Metal Seals) knowledge-management system, we found such contextual aspects as the structure of expert communities, the patterns of communication across disciplines, and the informal representations, sketches, and stories experts use in casual discussion to be essential to our efforts. Preserving these ìextra-technicalî features in the systemís content and organization gives users an implicit experience of the subtle interpretations, viewpoints, and strategies that define engineering expertise.

international parallel and distributed processing symposium | 2002

ASCI queuing systems: overview and comparisons

Stephen D. Kleban

This paper describes research in and a performance comparison of the Accelerated Strategic Computing Initiative (ASCI) queuing algorithms using a newly developed simulator. The goal of this research is to develop models of the queuing systems used at the Sandia, Los Alamos, and Livermore National Laboratories (SNL, LANL, LLNL) and to study the strengths and weaknesses of the queuing algorithms. We present results of a number of simulation runs using actual job log data as well as generated job data where we analyze various performance metrics relating to overall facility utilization, responsiveness at the individual job level, and predictability in terms of run time. We find that the algorithms generally perform fairly closely to each other and are able to identify some manifestations of queuing policy in the performance metrics.

annual simulation symposium | 2002

An architecture and implementation to support large-scale data access in scientific simulation environments

Victor P. Holmes; Stephen D. Kleban; David J. Miller; Constantine Pavlakos; Clark A. Poore; Ruthe L. Vandewart; Charles P. Crowley

At Sandia National Laboratories, a Data Services system has been developed to provide web-based access to high-performance computing clusters that host a set of post-processing applications for very large-scale data manipulation and visualization. A three-tier architecture provides a meta-framework for a collection of smaller frameworks, each of which satisfies a particular aspect of the overall system, including frameworks for a common data model, distributed resource management, component-based software on the cluster, and security. A prototype implementation has been completed which demonstrates the use of all of these frameworks in an integrated environment to provide end users the ability to manage and understand simulation results for very large, complex problems.

Journal of Intelligent Manufacturing | 1996

Expert system support for environmental assessment of manufacturing products and facilities

Stephen D. Kleban; George F. Luger; Randall D. Watkins

The goal of environmentally conscious design for manufacturing is to select materials and processes that minimize environmental impact. This paper describes a general and uniform way to analyze the environmental impact of manufacturing based on the product decomposition, the materials used in the manufacturing processes, and the particular view of the environment. To accomplish this task, we developed a computer program, called EcoSysℳ, that assists manufacturing engineers and environmental reviewers in assessing the environmental consequences of their manufacturing decisions.

cluster computing and the grid | 2004

With great reliability comes great responsibility: tradeoffs of run-time policy on high reliability systems

Stephen D. Kleban; J. R. Johnston; J. A. Ang; Scott H. Clearwater

In this paper we describe a simulation study to improve performance on a large highly utilized cluster at Sandia National Laboratories. The unique characteristic about the cluster is that there are very few constraints on job size. In particular, the run-time is limited only by system times which occur about every two weeks. The major contribution of this paper is that we quantify the difference in makespan between running a single long job and its equivalent in many shorter jobs. We find that running longer jobs is beneficial to the facility as a whole when the cycle-weighted makespans are considered and that running shorter jobs has an overall beneficial effect on the makespan for the jobs taken unweighted and for most users.

hawaii international conference on system sciences | 2001

Collaborative evaluation of early design decisions and product manufacturability

Stephen D. Kleban; William A. Stubblefield; K. W. Mitchiner; John L. Mitchiner; M. Arms

In manufacturing, the conceptual design and detailed design stages are typically regarded as sequential and distinct. Decisions made in conceptual design are often made with little information as to how they would affect detailed design or manufacturing process specification. Many possibilities and unknowns exist in conceptual design where ideas about product shape and functionality are changing rapidly. Few if any tools exist to aid in this difficult amorphous stage in contrast to the many CAD and analysis tools for detailed design where much more is known about the final product. The paper discusses the Materials Process Design Environment (MPDE), a collaborative problem solving environment (CPSE) that was developed so geographically dispersed designers in both the conceptual and detailed stage can work together and understand the impacts of their design decisions on functionality, cost and manufacturability.

Explore More