Darren J. Kerbyson
Pacific Northwest National Laboratory
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Darren J. Kerbyson.
conference on high performance computing (supercomputing) | 2001
Darren J. Kerbyson; Henry J. Alme; Adolfy Hoisie; Fabrizio Petrini; Harvey J. Wasserman; Michael L. Gittings
In this work we present a predictive analytical model that encompasses the performance and scaling characteristics of an important ASCI application. SAGE (SAIC’s Adaptive Grid Eulerian hydrocode) is a multidimensional hydrodynamics code with adaptive mesh refinement. The model is validated against measurements on several systems including ASCI Blue Mountain, ASCI White, and a Compaq Alphaserver ES45 system showing high accuracy. It is parametric - basic machine performance numbers (latency, MFLOPS rate, bandwidth) and application characteristics (problem size, decomposition method, etc.) serve as input. The model is applied to add insight into the performance of current systems, to reveal bottlenecks, and to illustrate where tuning efforts can be effective. We also use the model to predict performance on future systems.
Image and Vision Computing | 1999
Tim J. Atherton; Darren J. Kerbyson
The Circle Hough Transform (CHT) has become a common method for circle detection in numerous image processing applications. Various modifications to the basic CHT operation have been suggested which include: the inclusion of edge orientation, simultaneous consideration of a range of circle radii, use of a complex accumulator array with the phase proportional to the log of radius, and the implementation of the CHT as filter operations. However, there has also been much work recently on the definition and use of invariance filters for object detection including circles. The contribution of the work presented here is to show that a specific combination of modifications to the CHT is formally equivalent to applying a scale invariant kernel operator. This work brings together these two themes in image processing which have herewith been quite separate. Performance results for applying various forms of CHT filters incorporating some or all of the available modifications, along with results from the invariance kernel, are included. These are in terms of an analysis of the peak width in the output detection array (with and without the presence of noise), and also an analysis of the peak position in terms of increasing noise levels. The results support the equivalence between the specific form of the CHT developed in this work and the invariance kernel.
ieee international conference on high performance computing data and analytics | 2000
Graham R. Nudd; Darren J. Kerbyson; Efstathios Papaefstathiou; S. C. Perry; John Stuart Harper; Daniel V. Wilcox
This paper describes a methodology that provides detailed predictive performance information throughout the software design and implementation cycles. It is structured around a hierarchy of performance models that describe the computing system in terms of its software, parallelization, and hardware components. The methodology is illustrated with an implementation, the performance analysis and characterization environment (PACE) system, which provides information concerning execution time, scalability, and resource use. A principal aim of the work is to provide a capability for rapid calculation of relevant performance numbers without sacrificing accuracy. The predictive nature of the approach provides both pre and post implementation analyses and allows implementation alternatives to be explored prior to the commitment of an application to a system. Because of the relatively fast analysis times, these techniques can be used at runtime to assist in application steering and scheduling with reference to dynamically changing systems and metacomputing.
ieee international conference on high performance computing data and analytics | 2008
Kevin J. Barker; Kei Davis; Adolfy Hoisie; Darren J. Kerbyson; Michael Lang; Scott Pakin; José Carlos Sancho
Roadrunner is a 1.38 Pflop/s-peak (double precision) hybrid-architecture supercomputer developed by LANL and IBM. It contains 12,240 IBM PowerXCell 8i processors and 12,240 AMD Opteron cores in 3,060 compute nodes. Roadrunner is the first supercomputer to run Linpack at a sustained speed in excess of 1 Pflop/s. In this paper we present a detailed architectural description of Roadrunner and a detailed performance analysis of the system. A case study of optimizing the MPI-based application Sweep3D to exploit Roadrunners hybrid architecture is also included. The performance of Sweep3D is compared to that of the code on a previous implementation of the Cell Broadband Engine architecture---the Cell BE---and on multi-core processors. Using validated performance models combined with Roadrunner-specific microbenchmarks we identify performance issues in the early pre-delivery system and infer how well the final Roadrunner configuration will perform once the system software stack has matured.
conference on high performance computing (supercomputing) | 2005
Kevin J. Barker; Alan F. Benner; Raymond R. Hoare; Adolfy Hoisie; Darren J. Kerbyson; Dan Li; Rami G. Melhem; Ramakrishnan Rajamony; Eugen Schenfeld; Shuyi Shao; Craig B. Stunkel; Peter A. Walker
The interconnect plays a key role in both the cost and performance of large-scale HPC systems. The cost of future high-bandwidth electronic interconnects mushrooms due to expensive optical transceivers needed between electronic switches. We describe a potentially cheaper and more power-efficient approach to building high-performance interconnects. Through empirical analysis of HPC applications, we find that the bulk of inter-processor communication (barring collectives) is bounded in degree and changes very slowly or never. Thus we propose a two-network interconnect: An Optical Circuit Switching (OCS) network handling long-lived bulk data transfers, using optical switches; and a secondary lower-bandwidth Electronic Packet Switching (EPS) network. An OCS could be significantly cheaper, as it uses fewer optical transceivers than an electronic network. Collectives and transient communication packets traverse the electronic network. We present compiler techniques and dynamic run-time policies, using this two-network interconnect. Simulation results show that our approach provides high performance at low cost.
Scientific Programming | 2002
Junwei Cao; Stephen A. Jarvis; Subhash Saini; Darren J. Kerbyson; Graham R. Nudd
Resource management is an important component of a grid computing infrastructure. The scalability and adaptability of such systems are two key challenges that must be addressed. In this work an agent-based resource management system, ARMS, is implemented for grid computing. ARMS utilises the performance prediction techniques of the PACE toolkit to provide quantitative data regarding the performance of complex applications running on a local grid resource. At the meta-level, a hierarchy of homogeneous agents are used to provide a scalable and adaptable abstraction of the system architecture. Each agent is able to cooperate with other agents and thereby provide service advertisement and discovery for the scheduling of applications that need to utilise grid resources. A case study with corresponding experimental results is included to demonstrate the efficiency of the resource management and scheduling system.
cluster computing and the grid | 2001
Junwei Cao; Darren J. Kerbyson; Graham R. Nudd
Resource management is an important infrastructure in the grid computing environment. Scalability and adaptability are two key challenges in the implementation of such complex software systems. We introduce a new model for resource management in a metacomputing environment using a hierarchy of homogeneous agents that has the capability of service discovery. The performance of the agent system can be improved using different combinations of optimisation strategies. A modelling and simulation environment has been developed in this work that enables the performance of the system to be investigated. A simplified model of the resource management infrastructure is given as a case study and simulation results are included that show the impact of the choice of performance optimisation strategies on the overall system performance.
ieee international conference on high performance computing data and analytics | 2005
Darren J. Kerbyson; Philip W. Jones
In this paper we describe a performance model of the Parallel Ocean Program (POP). In particular, the latest version of POP (v2.0) is considered, which has similarities and differences to the earlier version (v1.4.3) as commonly used in climate simulations. The performance model encapsulates an understanding of POP’s data decomposition, processing flow, and scaling characteristics. The model is parametrized in many of the main input parameters to POP as well as characteristics of a processing system such as network latency and bandwidth. The performance model has been validated to date on a medium-sized (128 processor) AlphaServer ES40 system with the QsNet-1 interconnection network, and also on a larger scale (2048 processor) Blue Gene/Light system. The accuracy of the performance model is high when using two standard benchmark configurations, one of which represents a realistic configuration similar to that used in Community Climate System Model coupled climate simulations. The performance model is also used to explore the performance of POP after possible optimizations to the code, and different task to processor assignment strategies, whose performance cannot be currently measured.
conference on high performance computing (supercomputing) | 2006
José Carlos Sancho; Kevin J. Barker; Darren J. Kerbyson; Kei Davis
The design and implementation of a high performance communication network are critical factors in determining the performance and cost-effectiveness of a large-scale computing system. The major issues center on the trade-off between the network cost and the impact of latency and bandwidth on application performance. One promising technique for extracting maximum application performance given limited network resources is based on overlapping computation with communication, which partially or entirely hides communication delays. While this approach is not new, there are few studies that quantify the potential benefit of such overlapping for large-scale production scientific codes. We address this with an empirical method combined with a network model to quantify the potential overlap in several codes and examine the possible performance benefit. Our results demonstrate, for the codes examined, that a high potential tolerance to network latency and bandwidth exists because of a high degree of potential overlap. Moreover, our results indicate that there is often no need to use fine-grained communication mechanisms to achieve this benefit, since the major source of potential overlap is found in independent work-computation on which pending messages does not depend. This allows for a potentially significant relaxation of network requirements without a consequent degradation of application performance
cluster computing and the grid | 2002
Junwei Cao; Daniel P. Spooner; James D. Turner; Stephen A. Jarvis; Darren J. Kerbyson; Subhash Saini; Graham R. Nudd
It is envisaged that the grid infrastructure will be a large-scale distributed software system that will provide high-end computational and storage capabilities to differentiated users. A number of distributed computing technologies are being applied to grid development work, including CORBA and Jini. In this work, we introduce an A4 (Agile Architecture and Autonomous Agents) methodology, which can be used for resource management for grid computing. An initial system implementation utilises the performance prediction techniques of the PACE toolkit to provide quantitative data regarding the performance of complex applications running on local grid resources. At the meta-level, a hierarchy of identical agents is used to provide an abstraction of the system architecture. Each agent is able to cooperate with other agents to provide service advertisement and discovery to schedule applications that need to utilise grid resources. A performance monitor and advisor (PMA) is in development to optimize the performance of agent behaviours.