Ricky A. Kendall | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ricky A. Kendall is active.

Explore More

Publication

Featured researches published by Ricky A. Kendall.

Computer Physics Communications | 2000

High performance computational chemistry: An overview of NWChem a distributed parallel application☆

Ricky A. Kendall; Edoardo Aprà; David E. Bernholdt; Eric J. Bylaska; Michel Dupuis; George I. Fann; Robert J. Harrison; Jialin Ju; Jeffrey A. Nichols; Jarek Nieplocha; T.P. Straatsma; Theresa L. Windus; Adrian T. Wong

NWChem is the software package for computational chemistry on massively parallel computing systems developed by the High Performance Computational Chemistry Group for the Environmental Molecular Sciences Laboratory. The software provides a variety of modules for quantum mechanical and classical mechanical simulation. This article describes the design and some implementation details of the overall NWChem architecture. The architecture facilitates rapid development and portability of fully distributed application modules. We also delineate some of the functionality within NWChem and show performance of a few of the modules within NWChem.

Journal of Chemical Physics | 2003

The Parallel Implementation of a Full Configuration Interaction Program

Zhengting Gan; Yuri Alexeev; Mark S. Gordon; Ricky A. Kendall

Both the replicated and distributed data parallel full configuration interaction (FCI) implementations are described. The implementation of the FCI algorithm is organized in a hybrid strings-integral driven approach. Redundant communication is avoided, and the network performance is further optimized by an improved distributed data interface library. Examples show linear scalability of the distributed data code on both PC and workstation clusters. The new parallel implementation greatly extends the hardware on which parallel FCI calculations can be performed. The timing data on the workstation cluster show great potential for using the new parallel FCI algorithm in expanding applications of complete active space self-consistent field applications.

international conference on parallel processing | 2005

Optimizing collective communications on SMP clusters

Meng-Shiou Wu; Ricky A. Kendall; Kyle Wright

We describe a generic programming model to design collective communications on SMP clusters. The programming model utilizes shared memory for collective communications and overlapping inter-node/intra-node communications, both of which are normally platform specific approaches. Several collective communications are designed based on this model and tested on three SMP clusters of different configurations. The results show that the developed collective communications can, with proper tuning, provide significant performance improvements over existing generic implementations. For example, when broadcasting an 8 MB message our implementations outperform the vendors MPl/spl I.bar/Bcast by 35% on an IBM SP system, 51% on a G4 cluster, and 63% on an Intel cluster, the latter two using MPICHs MPl/spl I.bar/Bcast. With all-gather operations using 8 MB messages, our implementation outperform the vendors MPI/spl I.bar/Allgather by 75% on the IBM SP, 60% on the Intel cluster, and 48% on the G4 cluster.

Computer Physics Communications | 2002

The distributed data SCF

Yuri Alexeev; Ricky A. Kendall; Mark S. Gordon

This paper describes a distributed data parallel SCF algorithm. The distinguishing features of this algorithm are: (a) columns of density and Fock matrices are distributed evenly among processors, (b) pair-wise dynamic load balancing is developed to achieve excellent load balance, (c) network communication time is minimized via careful analysis of data flow in the SCF algorithm. The developed performance models and benchmarking results illustrate good performance of the distributed data SCF algorithm.

conference on high performance computing (supercomputing) | 2005

Performance Modeling and Tuning Strategies of Mixed Mode Collective Communications

Meng Shiou Wu; Ricky A. Kendall; Kyle Wright; Zhao Zhang

On SMP clusters, mixed mode collective MPI communications, which use shared memory communications within SMP nodes and point-to-point communications between SMP nodes, are more efficient than conventional implementations. In a previous study, we proposed several new methods that made mixed mode collective communications significantly faster than the pure point-to-point ones. However, the optimal performance required the tuning of many parameters, which was done by testing every possible setting and was very time consuming. In this study, we propose a new performance model that considers the special characteristics of mixed mode collective communications. The model provides good predictions to reduce most settings without testing by execution. It considers both shared-memory and point-to-point communications, while existing performance models only consider the point-to-point ones. Based on this model, we develop a number of tuning strategies that reduce the overall tuning time to only 10% of previous tuning time.

international parallel and distributed processing symposium | 2007

Integrating Performance Tools with Large-Scale Scientific Software

Meng-Shiou Wu; Jonathan L. Bentz; Fang Peng; Masha Sosonkina; Mark S. Gordon; Ricky A. Kendall

Modern performance tools provide methods for easy integration into an application for performance evaluation. For a large-scale scientific software package that has been under development for decades and with developers around the world, several obstacles must be overcome in order to utilize modern performance tools and explore performance bottlenecks. In this paper, we present our experience in integrating performance tools with one popular computational chemistry package. We discuss the difficulties we encountered and the mechanisms developed to integrate performance tools into this code. With performance tools integrated, we show one of the initial performance evaluation results, and discuss what other challenges we are facing to conduct performance evaluation for large-scale scientific packages.

international workshop on openmp | 2004

Parallelization of general matrix multiply routines using OpenMP

Jonathan L. Bentz; Ricky A. Kendall

An application programmer interface (API) is developed to facilitate, via OpenMP, the parallelization of the double precision general matrix multiply routine called from within GAMESS [1] during the execution of the coupled-cluster module for calculating physical properties of molecules. Results are reported using the ATLAS library and the Intel MKL on an Intel machine, and using the ESSL and the ATLAS library on an IBM SP.

parallel computing | 2004

OpenMP on distributed memory via global arrays

Lei Huang; Barbara M. Chapman; Ricky A. Kendall

Publisher Summary This chapter discusses a strategy for implementing OpenMp on distributed memory systems that relies on a source-to-source translation from OpenMP to Global Arrays. Global Arrays is a library with routines for managing data that is declared as shared in a user program. It provides a higher level of control over the mapping of such data to the target machine, and enables precise specification of the required accesses. The chapter introduces the features of Global Arrays, outline the translation and its challenges, and consider how user-level support enables to improve this process. Early experiments are discussed. The chapter discusses some ideas for future work to improve the performance and other approaches to provide OpenMP on clusters and other distributed memory platforms.

international conference on computational science | 2004

Efficient Translation of OpenMP to Distributed Memory

Lei Huang; Barbara M. Chapman; Zhenying Liu; Ricky A. Kendall

The shared memory paradigm provides many benefits to the parallel programmer, particular with respect to applications that are hard to parallelize. Unfortunately, there are currently no efficient implementations of OpenMP for distributed memory platforms and this greatly diminishes its usefulness for real world parallel application development. In this paper we introduce a basic strategy for implementing OpenMP on distributed memory systems via a translation to Global Arrays. Global Arrays is a library of routines that provides many of the same features as OpenMP yet targets distributed memory platforms. Since it enables a reasonable translation strategy and also allows precise control over the movement of data within the resulting code, we believe it has the potential to provide higher levels of performance than the traditional translation of OpenMP to distributed memory via software distributed shared memory.

Archive | 2002

Computational Chemistry for Nuclear Waste Characterization and Processing: Relativistic Quantum Chemistry of Actinides

Robert J. Harrison; David E. Bernholdt; Bruce E. Bursten; Wibe A. de Jong; David A. Dixon; Kenneth G. Dyall; Walter V. Ermler; George I. Fann; P. J. Hay; Nina Ismail Buchner; Ricky A. Kendall; Jun Li; Maria M. Marino; Colin J. Marsden; Richard L. Martin; Michael Minkoff; Jeffrey A. Nichols; Jarek Nieplocha; Russell M. Pitzer; Lawrence R. Pratt; Hans Georg Schreckenbach; Michael Seth; Ron Shepard; Rick Stevens; Jeffrey L. Tilson; Albert F. Wagner; Qi Wang; Theresa L. Windus; Adrian T. Wong; Zhiyong Zhang

In the course of the 3 years we have conducted calculations on molecular structures containing actinides, lanthanides, and other heavy elements. Our calculations were done at the relativistically-correct, all-electron, 4-component calculations (DHF, MP2, and CCSD(T)), using density functional theory (DFT) with relativistic effective core potentials (RECPs), and various other methodologies. We studied the ground- and excited state structures, energetics, vibrational frequencies, and NMR, excitation and ionization spectra. In addition a considerable amount of codes and methodologies have been developed during the GC3 period, enabling us to do the extensive research described in this final report, and providing researchers worldwide with new computational chemistry tools. In this section we will give a brief overview of our activities and accomplishments, grouped by each research institution. A more extensive overview can be found in the appendices containing the full yearly reports.

Explore More