Rory Kelly
National Center for Atmospheric Research
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Rory Kelly.
Computing in Science and Engineering | 2010
Rory Kelly
Much success has been achieved using GPUs to accelerate existing applications that are highly data parallel, or that are dominated by small, intense computational kernels. What are the prospects for porting existing large scientific models that do not fit this mold? We take an expensive routine from the CAM atmosphere model, and port it to a GPU using CUDA. We use the experience gained as a guide in thinking about porting the full application to an accelerator based system. We consider the best path forward for getting large scientific models running on accelerator based systems, and identify cases where porting may be feasible, and where a complete redesign may be the best option.
ieee international conference on high performance computing data and analytics | 2011
Davide Del Vento; David L. Hart; Thomas Engel; Rory Kelly; Richard A. Valent; Siddhartha S. Ghosh; Si Liu
NCARs Bluefire supercomputer is instrumented with a set of low-overhead processes that continually monitor the floating point counters of its 3,840 batch-compute cores. We extract performance numbers for each batch job by correlating the data from corresponding nodes. From experience and heuristics for good performance, we use this data, in part, to identify poorly performing jobs and then work with the users to improve their jobs efficiency. Often, the solution involves simple steps such as spawning an adequate number of processes or threads, binding the processes or threads to cores, using large memory pages, or using adequate compiler optimization. These efforts typically result in performance improvements and a wall-clock runtime reduction of 10% to 20%. With more involved changes to codes and scripts, some users have obtained performance improvements of 40% to 90%. We discuss our instrumentation, some successful cases, and its general applicability to other systems.
Monthly Weather Review | 2017
Peter H. Lauritzen; Mark A. Taylor; James R. Overfelt; Paul A. Ullrich; Steve Goldhaber; Rory Kelly
AbstractAn algorithm to consistently couple a conservative semi-Lagrangian finite-volume transport scheme with a spectral element (SE) dynamical core is presented. The semi-Lagrangian finite-volume scheme is the Conservative Semi-Lagrangian Multitracer (CSLAM), and the SE dynamical core is the National Center for Atmospheric Research (NCAR)’s Community Atmosphere Model–Spectral Elements (CAM-SE). The primary motivation for coupling CSLAM with CAM-SE is to accelerate tracer transport for multitracer applications. The coupling algorithm result is an inherently mass-conservative, shape-preserving, and consistent (for a constant mixing ratio, the CSLAM solution reduces to the SE solution for air mass) transport that is efficient and accurate. This is achieved by first deriving formulas for diagnosing SE airmass flux through the CSLAM control volume faces. Thereafter, the upstream Lagrangian CSLAM areas are iteratively perturbed to match the diagnosed SE airmass flux, resulting in an equivalent upstream Lagran...
Proceedings of the 2015 XSEDE Conference on Scientific Advancements Enabled by Enhanced Cyberinfrastructure | 2015
Rory Kelly; Si Liu; Siddhartha S. Ghosh; Davide Del Vento; David L. Hart; Dan Nagle; B. J. Smith; Richard A. Valent
Scientists and engineers using supercomputer clusters should be able to focus on their scientific and technical work instead of worrying about operating their user environment. However, creating a convenient and effective user environment on modern supercomputers becomes more and more challenging due to the complexity of these large-scale systems. In this report, we discuss important design issues and goals in user environment that must support multiple compiler suites, various applications, and diverse libraries on heterogeneous computing architectures. We present our implementation on the latest high-performance computing system, Yellowstone, which is a powerful dedicated resource for earth system science deployed by the National Center for Atmospheric Research. Our newly designed user environment is built upon a hierarchical module structure, customized wrapper scripts, pre-defined system modules, Lmod modules implementation, and several creative tools. The resulting implementation realizes many great features including streamlined control, versioning, user customization, automated documentation, etc., and accommodates both novice and experienced users. The design and implementation also minimize the effort of the administrator and support team in managing users environment. The smooth application and positive feedback from our users demonstrate that our design and implementation on the Yellowstone system have been well accepted and have facilitated thousands of users all over the world.
ieee international conference on high performance computing data and analytics | 2011
Rory Kelly; Davide Del Vento; Siddartha S. Ghosh; Richard A. Valent; Si Liu
The NCAR-Wyoming Supercomputing Center (NWSC) will begin operating in June 2012, and will house NCARs next generation HPC system. The NWSC will support a broad spectrum of Earth Science research drawn from a user community with diverse requirements for computing, storage, and data analysis resources. To ensure that the NWSC satisfies the needs of this community, the procurement benchmarking process was driven by science requirements from the start. We will discuss the science objectives for NWSC, translating scientific goals into technical requirements for a machine, and assembling a benchmark suite from community science models and synthetic tests to measure the technical capabilities of the proposed HPC systems. We will also talk about the benchmark analysis process, extending the benchmark suite as a testing tool over the life of the machine, and the applicability of the NWSC benchmarking suite to other HPC centers.
high performance computer architecture | 2016
Siddhartha S. Ghosh; Davide DelVento; Rory Kelly; Irfan Elahi; Nathan Rini; Benjamin Matthews; Storm Knights; Thomas Engel; Ben Jamroz; Shawn Strande
We measured InfiniBand traffic in our full fat tree fabric and measured performance impact of trimming the fabric on our major application kernels. Based on traffic pattern analysis and application performance impact we infer that a 2:1 trimmed fat tree is a cost effective alternative to a full fat tree for this specific set of applications. The methodology we used may be useful for others who are performing design trade-offs for HPC systems. We also propose that switch hardware vendors design director class switches with trimmed fat tree options that optimize per port costs.
Archive | 2016
Shawn Strande; Irfan Elahi; Benjamin Matthews; Siddhartha S. Ghosh; Davide Del Vento; Rory Kelly; Stormy Knight; Nathan Rini; Shawn Needham
Archive | 2009
Jose Garcia; Rory Kelly
Archive | 2009
Rory Kelly; Humberto Garcia
Archive | 2009
Humberto Garcia; F. Bollig; Rory Kelly; Benjamin Mayer; G. Erlebacher