Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Daniel P. Spooner is active.

Publication


Featured researches published by Daniel P. Spooner.


Future Generation Computer Systems | 2005

Grid load balancing using intelligent agents

Junwei Cao; Daniel P. Spooner; Stephen A. Jarvis; Graham R. Nudd

Scalable management and scheduling of dynamic grid resources requires new technologies to build the next generation intelligent grid environments. This work demonstrates that AI techniques can be utilised to achieve effective workload and resource management. A combination of intelligent agents and multi-agent approaches is applied to both local grid resource scheduling and global grid load balancing. Each agent is a representative of a local grid resource and utilises predictive application performance data with iterative heuristic algorithms to engineer local load balancing across multiple hosts. At a higher level, agents cooperate with each other to balance workload using a peer-to-peer service advertisement and discovery mechanism.


international parallel and distributed processing symposium | 2003

Agent-based grid load balancing using performance-driven task scheduling

Junwei Cao; Daniel P. Spooner; Stephen A. Jarvis; Subhash Saini; Graham R. Nudd

Load balancing is a key concern when developing parallel and distributed computing applications. The emergence of computational grids extends this problem, where issues of cross-domain and large-scale scheduling must also be considered. In this paper an agent-based grid management infrastructure is coupled with a performance-driven task scheduler that has been developed for local grid load balancing. Each grid scheduler utilises predictive application performance data and an iterative heuristic algorithm to engineer local load balancing across multiple processing nodes. At a higher level, a hierarchy of homogeneous agents are used to represent multiple grid resources. Agents cooperate with each other to balance workload in the global grid environment using service advertisement and discovery mechanisms. A case study is included with corresponding experimental results to demonstrate that both local schedulers and agents contribute to overall grid load balancing, which significantly improves grid application execution performance and resource utilisation.


international parallel and distributed processing symposium | 2003

Performance prediction and its use in parallel and distributed computing systems

Stephen A. Jarvis; Daniel P. Spooner; Hélène Niuklan Lim Choi Keung; Graham R. Nudd

A performance prediction framework is described in which predictive data generated by the PACE toolkit is stored and published through a Globus MDS-based performance information service. Distributing this data allows additional performance-based middleware tools to be built; the paper describes two such tools, a local-level scheduler and a system for wide-area task management. Experimental evidence shows that by integrating these performance tools for local- and wide-area management, considerable improvements can be made to task scheduling, resource utilisation and load balancing on heterogeneous distributed computing systems.


cluster computing and the grid | 2002

Agent-Based Resource Management for Grid Computing

Junwei Cao; Daniel P. Spooner; James D. Turner; Stephen A. Jarvis; Darren J. Kerbyson; Subhash Saini; Graham R. Nudd

It is envisaged that the grid infrastructure will be a large-scale distributed software system that will provide high-end computational and storage capabilities to differentiated users. A number of distributed computing technologies are being applied to grid development work, including CORBA and Jini. In this work, we introduce an A4 (Agile Architecture and Autonomous Agents) methodology, which can be used for resource management for grid computing. An initial system implementation utilises the performance prediction techniques of the PACE toolkit to provide quantitative data regarding the performance of complex applications running on local grid resources. At the meta-level, a hierarchy of identical agents is used to provide an abstraction of the system architecture. Each agent is able to cooperate with other agents to provide service advertisement and discovery to schedule applications that need to utilise grid resources. A performance monitor and advisor (PMA) is in development to optimize the performance of agent behaviours.


The Computer Journal | 2005

Performance-Aware Workflow Management for Grid Computing

Daniel P. Spooner; Junwei Cao; Stephen A. Jarvis; Ligang He; Graham R. Nudd

Grid middleware development has advanced rapidly over the past few years to support component-based programming models and service-oriented architectures. This is most evident with the forthcoming release of the Globus toolkit (GT4), which represents a convergence of concepts (and standards) from both the grid and web-services communities. Grid applications are increasingly modular, composed of workflow descriptions that feature both resource and application dynamism. Understanding the performance implications of scheduling grid workflows is critical in providing effective resource management and reliable service quality to users. This paper describes a series of extensions to an existing performance-aware grid management system (TITAN). These extensions provide additional support for workflow prediction and scheduling using a multi-domain performance management infrastructure.


IEEE Transactions on Parallel and Distributed Systems | 2006

Allocating non-real-time and soft real-time jobs in multiclusters

Ligang He; Stephen A. Jarvis; Daniel P. Spooner; Hong Jiang; Donna N. Dillenberger; Graham R. Nudd

This paper addresses workload allocation techniques for two types of sequential jobs that might be found in multicluster systems, namely, non-real-time jobs and soft real-time jobs. Two workload allocation strategies, the optimized mean response time (ORT) and the optimized mean miss rate (OMR), are developed by establishing and numerically solving two optimization equation sets. The ORT strategy achieves an optimized mean response time for non-real-time jobs, while the OMR strategy obtains an optimized mean miss rate for soft real-time jobs over multiple clusters. Both strategies take into account average system behaviors (such as the mean arrival rate of jobs) in calculating the workload proportions for individual clusters and the workload allocation is updated dynamically when the change in the mean arrival rate reaches a certain threshold. The effectiveness of both strategies is demonstrated through theoretical analysis. These strategies are also evaluated through extensive experimental studies and the results show that when compared with traditional strategies, the proposed workload allocation schemes significantly improve the performance of job scheduling in multiclusters, both in terms of the mean response time (for non-real-time jobs) and the mean miss rate (for soft real-time jobs).


dependable systems and networks | 2006

Improving the Fault Resilience of Overlay Multicast for Media Streaming

Guang Tan; Stephen A. Jarvis; Daniel P. Spooner

A key technical challenge for overlay multicast is that the highly dynamic multicast members can make data delivery unreliable. In this paper, we address this issue in the context of live media streaming by exploring 1) how to construct a stable multicast tree that minimizes the negative impact of frequent member departures on an existing overlay and 2) how to efficiently recover from packet errors caused by end-system or network failures. For the first problem, we identify two layout schemes for the tree nodes, namely, the bandwidth-ordered tree and the time-ordered tree, which represent two typical approaches to improving tree reliability, and conduct a stochastic analysis on their properties regarding reliability and tree depth. Based on the findings, we propose a distributed reliability-oriented switching tree (ROST) algorithm that minimizes the failure correlation among tree nodes. Compared with some commonly used distributed algorithms, the ROST algorithm significantly improves tree reliability and reduces average service delay, while incurring only a small protocol overhead; furthermore, it features a mechanism that prevents cheating or malicious behaviors in the exchange of bandwidth/time information. For the second problem, we develop a simple cooperative error recovery (CER) protocol that helps recover from packet errors efficiently. Recognizing that a single recovery source is usually incapable of providing the timely delivery of the lost data, the protocol recovers from data outages using the residual bandwidths from multiple sources, which are identified using a minimum-loss-correlation algorithm. Extensive simulations demonstrate the effectiveness of the proposed schemes


international parallel and distributed processing symposium | 2004

Optimising static workload allocation in multiclusters

Ligang He; Stephen A. Jarvis; Daniel P. Spooner; Graham R. Nudd

Summary form only given. Workload allocation and job dispatching are two fundamental components in static job scheduling for distributed systems. We address the static workload allocation techniques for two types of job stream in multicluster systems, namely, nonreal-time job streams and soft-real-time job streams, which request different qualities of service. Two workload allocation strategies (called ORT and OMR) are developed by establishing and numerically solving two optimisation equation sets. The ORT strategy achieves the optimised mean response time for the nonreal-time job stream; while the OMR strategy can gain the optimised mean miss rate for the soft-real-time job stream over multiple clusters (these strategies can also be applied in a single cluster system). The effectiveness of both strategies is demonstrated through theoretical analysis. The proposed workload allocation schemes are combined with two job dispatching strategies (weighted random and weighted round-robin) to generate new static job scheduling algorithms for multicluster environments. These algorithms are evaluated through extensive experimental studies and the results show that compared with static approaches without the optimisation techniques, the proposed workload allocation schemes can significantly improve the performance of static job scheduling in multiclusters, in terms of both the mean response time (for the nonreal-time jobs) and the mean miss rate (for soft-real-time jobs).


international parallel and distributed processing symposium | 2002

Performance prediction technology for agent-based resource management in grid environments

Junwei Cao; Stephen A. Jarvis; Daniel P. Spooner; James D. Turner; Darren J. Kerbyson; Graham R. Nudd

This paper introduces a resource management infrastructure for grid computing environments. The technique couples application performance prediction with a hierarchical multi-agent system. An initial system implementation utilises the performance prediction capabilities of the PACE toolkit to provide quantitative data regarding the performance of complex applications running on local grid resources. The validation results show that a high level of accuracy can be obtained, that cross-platform comparisons can be easily undertaken, and that the estimates can be evaluated rapidly. A hierarchy of homogeneous agents are used to provide a scalable and adaptable abstraction of the grid system architecture. An agent is a representative of a local grid resource and is considered to be both a service provider and a service requestor. Agents are organised into a hierarchy and cooperate to provide service advertisement and discovery. A performance monitor and advisor has been developed to optimise the performance of the agent system. A case study with corresponding experimental results are included to demonstrate the efficiency of the resource management and scheduling system.


The Journal of Supercomputing | 2005

An Investigation into the Application of Different Performance Prediction Methods to Distributed Enterprise Applications

David A. Bacigalupo; Stephen A. Jarvis; Ligang He; Daniel P. Spooner; Donna N. Dillenberger; Graham R. Nudd

Response time predictions for workload on new server architectures can enhance Service Level Agreement–based resource management. This paper evaluates two performance prediction methods using a distributed enterprise application benchmark. The historical method makes predictions by extrapolating from previously gathered performance data, while the layered queuing method makes predictions by solving layered queuing networks. The methods are evaluated in terms of: the systems that can be modelled; the metrics that can be predicted; the ease with which the models can be created and the level of expertise required; the overheads of recalibrating a model; and the delay when evaluating a prediction. The paper also investigates how a prediction-enhanced resource management algorithm can be tuned so as to compensate for predictive inaccuracy and balance the costs of SLA violations and server usage.

Collaboration


Dive into the Daniel P. Spooner's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ligang He

University of Warwick

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Guang Tan

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Darren J. Kerbyson

Pacific Northwest National Laboratory

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge