Hubertus J. J. van Dam
Pacific Northwest National Laboratory
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Hubertus J. J. van Dam.
Physical Chemistry Chemical Physics | 2010
Wibe A. de Jong; Eric J. Bylaska; Niranjan Govind; Curtis L. Janssen; Karol Kowalski; Thomas J. J. Müller; Ida M. B. Nielsen; Hubertus J. J. van Dam; Valera Veryazov; Roland Lindh
Parallel hardware has become readily available to the computational chemistry research community. This perspective will review the current state of parallel computational chemistry software utilizing high-performance parallel computing platforms. Hardware and software trends and their effect on quantum chemistry methodologies, algorithms, and software development will also be discussed.
Journal of Chemical Theory and Computation | 2013
Kiran Bhaskaran-Nair; Wenjing Ma; Sriram Krishnamoorthy; Oreste Villa; Hubertus J. J. van Dam; Edoardo Aprà; Karol Kowalski
A novel parallel algorithm for noniterative multireference coupled cluster (MRCC) theories, which merges recently introduced reference-level parallelism (RLP) [Bhaskaran-Nair, K.; Brabec, J.; Aprà, E.; van Dam, H. J. J.; Pittner, J.; Kowalski, K. J. Chem. Phys.2012, 137, 094112] with the possibility of accelerating numerical calculations using graphics processing units (GPUs) is presented. We discuss the performance of this approach applied to the MRCCSD(T) method (iterative singles and doubles and perturbative triples), where the corrections due to triples are added to the diagonal elements of the MRCCSD effective Hamiltonian matrix. The performance of the combined RLP/GPU algorithm is illustrated on the example of the Brillouin-Wigner (BW) and Mukherjee (Mk) state-specific MRCCSD(T) formulations.
Journal of Chemical Theory and Computation | 2012
Jiří Brabec; Jiří Pittner; Hubertus J. J. van Dam; Edoardo Aprà; Karol Kowalski
A novel algorithm for implementing a general type of multireference coupled-cluster (MRCC) theory based on the Jeziorski-Monkhorst exponential ansatz [Jeziorski, B.; Monkhorst, H. J. Phys. Rev. A1981, 24, 1668] is introduced. The proposed algorithm utilizes processor groups to calculate the equations for the MRCC amplitudes. In the basic formulation, each processor group constructs the equations related to a specific subset of references. By flexible choice of processor groups and subset of reference-specific sufficiency conditions designated to a given group, one can ensure optimum utilization of available computing resources. The performance of this algorithm is illustrated on the examples of the Brillouin-Wigner and Mukherjee MRCC methods with singles and doubles (BW-MRCCSD and Mk-MRCCSD). A significant improvement in scalability and in reduction of time to solution is reported with respect to recently reported parallel implementation of the BW-MRCCSD formalism [Brabec, J.; van Dam, H. J. J.; Kowalski, K.; Pittner, J. Chem. Phys. Lett.2011, 514, 347].
Journal of Chemical Theory and Computation | 2013
Daniel W. Silverstein; Niranjan Govind; Hubertus J. J. van Dam; Lasse Jensen
A parallel implementation of analytical time-dependent density functional theory gradients is presented for the quantum chemistry program NWChem. The implementation is based on the Lagrangian approach developed by Furche and Ahlrichs. To validate our implementation, we first calculate the Stokes shifts for a range of organic dye molecules using a diverse set of exchange-correlation functionals (traditional density functionals, global hybrids, and range-separated hybrids) followed by simulations of the one-photon absorption and resonance Raman scattering spectrum of the phenoxyl radical, the well-studied dye molecule rhodamine 6G, and a molecular host-guest complex (TTF⊂CBPQT(4+)). The study of organic dye molecules illustrates that B3LYP and CAM-B3LYP generally give the best agreement with experimentally determined Stokes shifts unless the excited state is a charge transfer state. Absorption, resonance Raman, and fluorescence simulations for the phenoxyl radical indicate that explicit solvation may be required for accurate characterization. For the host-guest complex and rhodamine 6G, it is demonstrated that absorption spectra can be simulated in good agreement with experimental data for most exchange-correlation functionals. However, because one-photon absorption spectra generally lack well-resolved vibrational features, resonance Raman simulations are necessary to evaluate the accuracy of the exchange-correlation functional for describing a potential energy surface.
Journal of Chemical Physics | 2012
Jiří Brabec; Hubertus J. J. van Dam; Jiří Pittner; Karol Kowalski
The recently proposed universal state-selective (USS) corrections [K. Kowalski, J. Chem. Phys. 134, 194107 (2011)] to approximate multi-reference coupled-cluster (MRCC) energies can be commonly applied to any type of MRCC theory based on the Jeziorski-Monkhorst [B. Jeziorski and H. J. Monkhorst, Phys. Rev. A 24, 1668 (1981)] exponential ansatz. In this paper we report on the performance of a simple USS correction to the Brillouin-Wigner and Mukherjees MRCC approaches employing single and double excitations (USS-BW-MRCCSD and USS-Mk-MRCCSD). It is shown that the USS-BW-MRCCSD correction, which employs the manifold of single and double excitations, can be related to a posteriori corrections utilized in routine BW-MRCCSD calculations. In several benchmark calculations we compare the USS-BW-MRCCSD and USS-Mk-MRCCSD results with the results obtained with the full configuration interaction method.
Journal of Chemical Physics | 2012
Kiran Bhaskaran-Nair; Jiří Brabec; Edoardo Aprà; Hubertus J. J. van Dam; Jiří Pittner; Karol Kowalski
In this paper we discuss the performance of the non-iterative state-specific multireference coupled cluster (SS-MRCC) methods accounting for the effect of triply excited cluster amplitudes. The corrections to the Brillouin-Wigner and Mukherjees MRCC models based on the manifold of singly and doubly excited cluster amplitudes (BW-MRCCSD and Mk-MRCCSD, respectively) are tested and compared with exact full configuration interaction results for small systems (H(2)O, N(2), and Be(3)). For the larger systems (naphthyne isomers) the BW-MRCC and Mk-MRCC methods with iterative singles, doubles, and non-iterative triples (BW-MRCCSD(T) and Mk-MRCCSD(T)) are compared against the results obtained with single reference coupled cluster methods. We also report on the parallel performance of the non-iterative implementations based on the use of processor groups.
ieee international conference on high performance computing, data, and analytics | 2010
Abhinav Vishnu; Hubertus J. J. van Dam; Wibe A. de Jong; Pavan Balaji; Shuaiwen Song
The largest supercomputers in the world today consist of hundreds of thousands of processing cores and many more other hardware components. At such scales, hardware faults are a commonplace, necessitating fault-resilient software systems. While different fault-resilient models are available, most focus on allowing the computational processes to survive faults. On the other hand, we have recently started investigating fault resilience techniques for data-centric programming models such as the partitioned global address space (PGAS) models. The primary difference in data-centric models is the decoupling of computation and data locality. That is, data placement is decoupled from the executing processes, allowing us to view process failure (a physical node hosting a process is dead) separately from data failure (a physical node hosting data is dead). In this paper, we take a first step toward data-centric fault resilience by designing and implementing a fault-resilient, onesided communication runtime framework using Global Arrays and its communication system, ARMCI. The framework consists of a fault-resilient process manager; low-overhead and networkassisted remote-node fault detection module; non-data-moving collective communication primitives; and failure semantics and err or codes for one-sided communication runtime systems. Our performance evaluation indicates that the framework incurs little ov erhead compared to state-of-the-art designs and provides a fundamental framework of fault resiliency for PGAS models.
Journal of Chemical Theory and Computation | 2013
Hubertus J. J. van Dam; Abhinav Vishnu; Wibe A. de Jong
High performance computing platforms are expected to deliver 10(18) floating operations per second by the year 2022 through the deployment of millions of cores. Even if every core is highly reliable the sheer number of them will mean that the mean time between failures will become so short that most application runs will suffer at least one fault. In particular soft errors caused by intermittent incorrect behavior of the hardware are a concern as they lead to silent data corruption. In this paper we investigate the impact of soft errors on optimization algorithms using Hartree-Fock as a particular example. Optimization algorithms iteratively reduce the error in the initial guess to reach the intended solution. Therefore they may intuitively appear to be resilient to soft errors. Our results show that this is true for soft errors of small magnitudes but not for large errors. We suggest error detection and correction mechanisms for different classes of data structures. The results obtained with these mechanisms indicate that we can correct more than 95% of the soft errors at moderate increases in the computational cost.
ieee international conference on high performance computing, data, and analytics | 2014
Jeffrey A. Daily; Abhinav Vishnu; Bruce J. Palmer; Hubertus J. J. van Dam; Darren J. Kerbyson
Partitioned Global Address Space (PGAS) models are emerging as a popular alternative to MPI models for designing scalable applications. At the same time, MPI remains a ubiquitous communication subsystem due to its standardization, high performance, and availability on leading platforms. In this paper, we explore the suitability of using MPI as a scalable PGAS communication subsystem. We focus on the Remote Memory Access (RMA) communication in PGAS models which typically includes get, put, and atomic memory operations. We perform an in-depth exploration of design alternatives based on MPI. These alternatives include using a semantically-matching interface such as MPI-RMA, as well as not-so-intuitive interfaces such as MPI two-sided with a combination of multi-threading and dynamic process management. With an in-depth exploration of these alternatives and their shortcomings, we propose a novel design which is facilitated by the data-centric view in PGAS models. This design leverages a combination of highly tuned MPI two-sided semantics and an automatic, user-transparent split of MPI communicators to provide asynchronous progress. We implement the asynchronous progress ranks approach and other approaches within the Communication Runtime for Exascale which is a communication subsystem for Global Arrays. Our performance evaluation spans pure communication benchmarks, graph community detection and sparse matrix-vector multiplication kernels, and a computational chemistry application. The utility of our proposed PR-based approach is demonstrated by a 2.17x speedup on 1008 processors over the other MPI-based designs.
Journal of Chemical Theory and Computation | 2011
Hubertus J. J. van Dam; Abhinav Vishnu; Wibe A. de Jong
In the past couple of decades, the massive computational power provided by the most modern supercomputers has resulted in simulation of higher-order computational chemistry methods, previously considered intractable. As the system sizes continue to increase, the computational chemistry domain continues to escalate this trend using parallel computing with programming models such as Message Passing Interface (MPI) and Partitioned Global Address Space (PGAS) programming models such as Global Arrays. The ever increasing scale of these supercomputers comes at a cost of reduced Mean Time Between Failures (MTBF), currently on the order of days and projected to be on the order of hours for upcoming extreme scale systems. While traditional disk-based check pointing methods are ubiquitous for storing intermediate solutions, they suffer from high overhead of writing and recovering from checkpoints. In practice, checkpointing itself often brings the system down. Clearly, methods beyond checkpointing are imperative to handling the aggravating issue of reducing MTBF. In this paper, we address this challenge by designing and implementing an efficient fault tolerant version of the Coupled Cluster (CC) method with NWChem, using in-memory data redundancy. We present the challenges associated with our design, including an efficient data storage model, maintenance of at least one consistent data copy, and the recovery process. Our performance evaluation without faults shows that the current design exhibits a small overhead. In the presence of a simulated fault, the proposed design incurs negligible overhead in comparison to the state of the art implementation without faults.