Patrick H. Worley | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Patrick H. Worley is active.

Explore More

Publication

Featured researches published by Patrick H. Worley.

Journal of Climate | 2004

The Community Climate System Model Version 4

Peter R. Gent; Gokhan Danabasoglu; Leo J. Donner; Marika M. Holland; Elizabeth C. Hunke; Steven R. Jayne; David M. Lawrence; Richard Neale; Philip J. Rasch; Mariana Vertenstein; Patrick H. Worley; Zong-Liang Yang; Minghua Zhang

AbstractThe fourth version of the Community Climate System Model (CCSM4) was recently completed and released to the climate community. This paper describes developments to all CCSM components, and documents fully coupled preindustrial control runs compared to the previous version, CCSM3. Using the standard atmosphere and land resolution of 1° results in the sea surface temperature biases in the major upwelling regions being comparable to the 1.4°-resolution CCSM3. Two changes to the deep convection scheme in the atmosphere component result in CCSM4 producing El Nino–Southern Oscillation variability with a much more realistic frequency distribution than in CCSM3, although the amplitude is too large compared to observations. These changes also improve the Madden–Julian oscillation and the frequency distribution of tropical precipitation. A new overflow parameterization in the ocean component leads to an improved simulation of the Gulf Stream path and the North Atlantic Ocean meridional overturning circulati...

ieee international conference on high performance computing data and analytics | 2012

CAM-SE: A scalable spectral element dynamical core for the Community Atmosphere Model

John M. Dennis; Jim Edwards; Katherine J. Evans; Oksana Guba; Peter H. Lauritzen; Arthur A. Mirin; Amik St-Cyr; Mark A. Taylor; Patrick H. Worley

The Community Atmosphere Model (CAM) version 5 includes a spectral element dynamical core option from NCAR’s High-Order Method Modeling Environment. It is a continuous Galerkin spectral finite-element method designed for fully unstructured quadrilateral meshes. The current configurations in CAM are based on the cubed-sphere grid. The main motivation for including a spectral element dynamical core is to improve the scalability of CAM by allowing quasi-uniform grids for the sphere that do not require polar filters. In addition, the approach provides other state-of-the-art capabilities such as improved conservation properties. Spectral elements are used for the horizontal discretization, while most other aspects of the dynamical core are a hybrid of well-tested techniques from CAM’s finite volume and global spectral dynamical core options. Here we first give an overview of the spectral element dynamical core as used in CAM. We then give scalability and performance results from CAM running with three different dynamical core options within the Community Earth System Model, using a pre-industrial time-slice configuration. We focus on high-resolution simulations, using 1/4 degree, 1/8 degree, and T341 spectral truncation horizontal grids.

conference on high performance computing (supercomputing) | 2003

Early Evaluation of the Cray X1

Thomas H. Dunigan; Mark R. Fahey; James B. White; Patrick H. Worley

Oak Ridge National Laboratory installed a 32 processor Cray X1 in March, 2003, and will have a 256 processor system installed by October, 2003. In this paper we describe our initial evaluation of the X1 architecture, focusing on microbenchmarks, kernels, and application codes that highlight the performance characteristics of the X1 architecture and indicate how to use the system most efficiently.

parallel computing | 1995

Design and performance of a scalable parallel community climate model

John B. Drake; Ian T. Foster; John Michalakes; Brian R. Toonen; Patrick H. Worley

Abstract We describe the design of a parallel global atmospheric circulation model, PCCM2. This parallel model is functionally equivalent to the National Center for Atmospheric Researchs Community Climate Model, CCM2, but is structured to exploit distributed memory multi-computers. PCCM2 incorporates parallel spectral transform, semi-Lagrangian transport, and load balancing algorithms. We present detailed performance results on the IBM SP2 and Intel Paragon. These results provide insights into the scalability of the individual parallel algorithms and of the parallel model as a whole.

ieee international conference on high performance computing data and analytics | 2008

Early evaluation of IBM BlueGene/P

Sadaf R. Alam; Richard Frederick Barrett; Michael H Bast; Mark R. Fahey; Jeffery A. Kuehn; Collin McCurdy; James H. Rogers; Philip C. Roth; Ramanan Sankaran; Jeffrey S. Vetter; Patrick H. Worley; Weikuan Yu

BlueGene/P (BG/P) is the second generation BlueGene architecture from IBM, succeeding BlueGene/L (BG/L). BG/P is a system-on-a-chip (SoC) design that uses four PowerPC 450 cores operating at 850 MHz with a double precision, dual pipe floating point unit per core. These chips are connected with multiple interconnection networks including a 3-D torus, a global collective network, and a global barrier network. The design is intended to provide a highly scalable, physically dense system with relatively low power requirements per flop. In this paper, we report on our examination of BG/P, presented in the context of a set of important scientific applications, and as compared to other major large scale supercomputers in use today. Our investigation confirms that BG/P has good scalability with an expected lower performance per processor when compared to the Cray XT4s Opteron. We also find that BG/P uses very low power per floating point operation for certain kernels, yet it has less of a power advantage when considering science driven metrics for mission applications.

Concurrency and Computation: Practice and Experience | 2005

Practical performance portability in the Parallel Ocean Program (POP)

Philip W. Jones; Patrick H. Worley; Yoshikatsu Yoshida; James B. White; John M. Levesque

The design of the Parallel Ocean Program (POP) is described with an emphasis on portability. Performance of POP is presented on a wide variety of computational architectures, including vector architectures and commodity clusters. Analysis of POP performance across machines is used to characterize performance and identify improvements while maintaining portability. A new design of the POP model, including a cache blocking and land point elimination scheme, is described with some preliminary performance results. Published in 2005 by John Wiley & Sons, Ltd.

SIAM Journal on Scientific Computing | 1997

Parallel Algorithms for the Spectral Transform Method

Ian T. Foster; Patrick H. Worley

The spectral transform method is a standard numerical technique for solving partial differential equations on a sphere and is widely used in atmospheric circulation models. Recent research has identified several promising algorithms for implementing this method on massively parallel computers; however, no detailed comparison of the different algorithms has previously been attempted. In this paper, we describe these different parallel algorithms and report on computational experiments that we have conducted to evaluate their efficiency on parallel computers. The experiments used a testbed code that solves the nonlinear shallow water equations on a sphere; considerable care was taken to ensure that the experiments provide a fair comparison of the different algorithms and that the results are relevant to global models. We focus on hypercube- and mesh-connected multicomputers with cut-through routing, such as the Intel iPSC/860, DELTA, and Paragon, and the nCUBE/2, but we also indicate how the results extend to other parallel computer architectures. The results of this study are relevant not only to the spectral transform method but also to multidimensional fast Fourier transforms (FFTs) and other parallel transforms.

conference on high performance computing (supercomputing) | 2007

Cray XT4: an early evaluation for petascale scientific simulation

Sadaf R. Alam; Jeffery A. Kuehn; Richard Frederick Barrett; Jeffrey M. Larkin; Mark R. Fahey; Ramanan Sankaran; Patrick H. Worley

The scientific simulation capabilities of next generation high-end computing technology will depend on striking a balance among memory, processor, I/O, and local and global network performance across the breadth of the scientific simulation space. The Cray XT4 combines commodity AMD dual core Opteron processor technology with the second generation of Crays custom communication accelerator in a system design whose balance is claimed to be driven by the demands of scientific simulation. This paper presents an evaluation of the Cray XT4 using micro-benchmarks to develop a controlled understanding of individual system components, providing the context for analyzing and comprehending the performance of several petascale-ready applications. Results gathered from several strategic application domains are compared with observations on the previous generation Cray XT3 and other high-end computing systems, demonstrating performance improvements across a wide variety of application benchmark problems.

conference on high performance computing (supercomputing) | 2005

Leading Computational Methods on Scalar and Vector HEC Platforms

Leonid Oliker; Jonathan Carter; Michael F. Wehner; Andrew Canning; Stephane Ethier; Arthur A. Mirin; David Parks; Patrick H. Worley; Shigemune Kitawaki; Yoshinori Tsuda

The last decade has witnessed a rapid proliferation of superscalar cache-based microprocessors to build high-end computing (HEC) platforms, primarily because of their generality, scalability, and cost effectiveness. However, the growing gap between sustained and peak performance for full-scale scientific applications on conventional supercomputers has become a major concern in high performance computing, requiring significantly larger systems and application scalability than implied by peak performance in order to achieve desired performance. The latest generation of custom-built parallel vector systems have the potential to address this issue for numerical algorithms with sufficient regularity in their computational structure. In this work we explore applications drawn from four areas: atmospheric modeling (CAM), magnetic fusion (GTC), plasma physics (LBMHD3D), and material science (PARATEC). We compare performance of the vector-based Cray X1, Earth Simulator, and newly-released NEC SX-8 and Cray X1E, with performance of three leading commodity-based superscalar platforms utilizing the IBM Power3, Intel Itanium2, and AMD Opteron processors. Our work makes several significant contributions: the first reported vector performance results for CAM simulations utilizing a finite-volume dynamical core on a high-resolution atmospheric grid; a new data-decomposition scheme for GTC that (for the first time) enables a breakthrough of the Teraflop barrier; the introduction of a new three-dimensional Lattice Boltzmann magneto-hydrodynamic implementation used to study the onset evolution of plasma turbulence that achieves over 26Tflop/s on 4800 ES promodity-based superscalar platforms utilizing the IBM Power3, Intel Itanium2, and AMD Opteron processors, with modern parallel vector systems: the Cray X1, Earth Simulator (ES), and the NEC SX-8. Additionally, we examine performance of CAM on the recently-released Cray X1E. Our research team was the first international group to conduct a performance evaluation study at the Earth Simulator Center; remote ES access is not available. Our work builds on our previous efforts [16, 17] and makes several significant contributions: the first reported vector performance results for CAM simulations utilizing a finite-volume dynamical core on a high-resolution atmospheric grid; a new datadecomposition scheme for GTC that (for the first time) enables a breakthrough of the Teraflop barrier; the introduction of a new three-dimensional Lattice Boltzmann magneto-hydrodynamic implementation used to study the onset evolution of plasma turbulence that achieves over 26Tflop/s on 4800 ES processors; and the largest PARATEC cell size atomistic simulation to date. Overall, results show that the vector architectures attain unprecedented aggregate performance across our application suite, demonstrating the tremendous potential of modern parallel vector systems.

conference on high performance computing (supercomputing) | 2002

Asserting Performance Expectations

Jeffrey S. Vetter; Patrick H. Worley

Traditional techniques for performance analysis provide a means for extracting and analyzing raw performance information from applications. Users then compare this raw data to their performance expectations for application constructs. This comparison can be tedious for the scale of todays architectures and software systems. To address this situation, we present a methodology and prototype that allows users to assert performance expectations explicitly in their source code using performance assertions. As the application executes, each performance assertion in the application collects data implicitly to verify the assertion. By allowing the user to specify a performance expectation with individual code segments, the runtime system can jettison raw data for measurements that pass their expectation, while reacting to failures with a variety of responses. We present several compelling uses of performance assertions with our operational prototype, including raising a performance exception, validating a performance model, and adapting an algorithm empirically at runtime.

Explore More