Jon B. Weissman | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jon B. Weissman is active.

Explore More

Publication

Featured researches published by Jon B. Weissman.

Journal of Parallel and Distributed Computing | 1994

Metasystems: An Approach Combining Parallel Processing and Heterogeneous Distributed Computing Systems

Andrew S. Grimshaw; Jon B. Weissman; Emily A. West; Edmond C. Loyot

Abstract A metasystem is a single computing resource composed of a heterogeneous group of autonomous computers linked together by a network. The interconnection network needed to construct large metasystems will soon be in place. To fully exploit these new systems, software that is easy to use, supports large degrees of parallelism, and hides the complexity of the underlying physical architecture must be developed. In this paper we describe our metasystem vision, our approach to constructing a metasystem testbed, and early experimental results. Our approach combines features from earlier work on both parallel processing systems and heterogeneous distributed computing systems. Using the testbed, we have found that data coercion costs are not a serious obstacle to high performance, but that load imbalance induced by differing processor capabilities can limit performance. We then present a mechanism to overcome load imbalance that utilizes user-provided callbacks.

international conference on parallel processing | 2004

A genetic algorithm based approach for scheduling decomposable data grid applications

Seonho Kim; Jon B. Weissman

Data grid technology promises geographically distributed scientists to access and share physically distributed resources such as compute resource, networks, storage, and most importantly data collections for large-scale data intensive problems. Because of the massive size and distributed nature of these datasets, scheduling data grid applications must consider communication and computation simultaneously to achieve high performance. In many data grid applications, data can be decomposed into multiple independent sub datasets and distributed for parallel execution and analysis. We exploit this property and propose a novel genetic algorithm based approach that automatically decomposes data onto communication and computation resources. The proposed GA-based scheduler takes advantage of the parallelism of decomposable data grid applications to achieve the desired performance level. We evaluate the proposed approach comparing with other algorithms. Simulation results show that the proposed GA-based approach can be a competitive choice for scheduling large data grid applications in terms of both scheduling overhead and the relative solution quality as compared to other algorithms.

IEEE Transactions on Parallel and Distributed Systems | 2007

Adaptive Reputation-Based Scheduling on Unreliable Distributed Infrastructures

Jason D. Sonnek; Abhishek Chandra; Jon B. Weissman

This paper addresses the inherent unreliability and instability of worker nodes in large-scale donation-based distributed infrastructures such as peer-to-peer and grid systems. We present adaptive scheduling techniques that can mitigate this uncertainty and significantly outperform current approaches. In this work, we consider nodes that execute tasks via donated computational resources and may behave erratically or maliciously. We present a model in which reliability is not a binary property, but a statistical one based on a nodes prior performance and behavior. We use this model to construct several reputation-based scheduling algorithms that employ estimated reliability ratings of worker nodes for efficient task allocation. Our scheduling algorithms are designed to adapt to changing system conditions, as well as nonstationary node reliability. Through simulation, we demonstrate that our algorithms can significantly improve throughput while maintaining a very high success rate of task completion. Our results suggest that reputation-based scheduling can handle a wide variety of worker populations, including nonstationary behavior, with overhead that scales well with system size. We also show that our adaptation mechanism allows the application designer fine-grain control over the desired performance metrics.

ACM Transactions on Computer Systems | 1996

Portable run-time support for dynamic object-oriented parallel processing

Andrew S. Grimshaw; Jon B. Weissman; W. Timothy Strayer

Mentat is an object-oriented parallel processing system designed to simplify the task of writing portable parallel programs for parallel machines and workstation networks. The Mentat compiler and run-time system work together to automatically manage the communication and synchronization between objects. The run-time system marshals member function arguments, schedules objects on processors, and dynamically constructs and executes large-grain data dependence graphs. In this article we present the Mentat run-time system. We focus on three aspects—the software architecture, including the interface to the compiler and the structure and interaction of the principle components of the run-time system; the run-time overhead on a component-by-component basis for two platforms, a Sun SparcStation 2 and an Intel Paragon; and an analysis of the minimum granularity required for application programs to overcome the run-time overhead.

IEEE Transactions on Parallel and Distributed Systems | 2007

A Robust Spanning Tree Topology for Data Collection and Dissemination in Distributed Environments

Darin England; Bharadwaj Veeravalli; Jon B. Weissman

Large-scale distributed applications are subject to frequent disruptions due to resource contention and failure. Such disruptions are inherently unpredictable and, therefore, robustness is a desirable property for the distributed operating environment. In this work, we describe and evaluate a robust topology for applications that operate on a spanning tree overlay network. Unlike previous work that is adaptive or reactive in nature, we take a proactive approach to robustness. The topology itself is able to simultaneously withstand disturbances and exhibit good performance. We present both centralized and distributed algorithms to construct the topology, and then demonstrate its effectiveness through analysis and simulation of two classes of distributed applications: Data collection in sensor networks and data dissemination in divisible load scheduling. The results show that our robust spanning trees achieve a desirable trade-off for two opposing metrics where traditional forms of spanning trees do not. In particular, the trees generated by our algorithms exhibit both resilience to data loss and low power consumption for sensor networks. When used as the overlay network for divisible load scheduling, they display both robustness to link congestion and low values for the makespan of the schedule

high performance distributed computing | 1996

A federated model for scheduling in wide-area systems

Jon B. Weissman; Andrew Grimshaw

A model for scheduling in wide area systems is described. The model is federated and utilizes a collection of local site schedulers that control the use of their resources. The wide area scheduler consults the local site schedulers to obtain candidate machine schedules. A set of issues and challenges inherent to wide area scheduling are also described and the proposed model is shown to address many of these problems. A distributed algorithm for wide area scheduling is presented and relies upon information made available about the resource needs of user jobs. The wide area scheduler will be implemented in Legion, a wide area computing system developed at the University of Virginia.

high performance distributed computing | 2011

Exploring MapReduce efficiency with highly-distributed data

Michael Cardosa; Chenyu Wang; Anshuman Nangia; Abhishek Chandra; Jon B. Weissman

MapReduce is a highly-popular paradigm for high-performance computing over large data sets in large-scale platforms. However, when the source data is widely distributed and the computing platform is also distributed, e.g. data is collected in separate data center locations, the most efficient architecture for running Hadoop jobs over the entire data set becomes non-trivial. In this paper, we show the traditional single-cluster MapReduce setup may not be suitable for situations when data and compute resources are widely distributed. Further, we provide recommendations for alternative (and even hierarchical) distributed MapReduce setup configurations, depending on the workload and data set.

Cluster Computing | 1998

Scheduling parallel applications in distributed networks

Jon B. Weissman; Xin Zhao

Prophet is a run-time scheduling system designed to support the efficient execution of parallel applications written in the Mentat programming language (Grimshaw, 1993). Prior results demonstrated that SPMD applications could be scheduled automatically in an Ethernet-based local-area workstation network with good performance (Weissman and Grimshaw, 1994 and 1995). This paper describes our recent efforts to extend Prophet along several dimensions: improved overhead control, greater resource sharing, greater resource heterogeneity, wide-area scheduling, and new application types. We show that both SPMD and task parallel applications can be scheduled effectively in a shared heterogeneous LAN environment containing ethernet and ATM networks by exploiting the application structure and dynamic run-time information.

high performance distributed computing | 2001

Dynamic replica management in the service grid

I Byoung-Dai Lee; Jon B. Weissman

As the Internet is evolving away from providing simple connectivity towards providing more sophisticated services, it is difficult to provide efficient delivery of high-demand services to end users, due to the dynamic sharing of the network and connected servers. To address this problem, we propose the service grid architecture that incorporates dynamic replication and deletion of services.

cluster computing and the grid | 2008

Orchestrating Data-Centric Workflows

Adam Barker; Jon B. Weissman; J. van Hemert

When orchestrating data-centric workflows as are commonly found in the sciences, centralised servers can become a bottleneck to the performance of a workflow; output from service invocations are normally transferred via a centralised orchestration engine, when they should be passed directly to where they are needed at the next service in the workflow. To address this performance bottleneck, this paper presents a lightweight hybrid workflow architecture and concrete API, based on a centralised control flow, distributed data flow model. Our architecture maintains the robustness and simplicity of centralised orchestration, but facilitates choreography by allowing services to exchange data directly with one another, reducing data that needs to be transferred through a centralised server. Furthermore our architecture is standards compliment, flexible and is a non-disruptive solution; service definitions do not have to be altered prior to enactment.

Explore More