Paul R. Eller | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Paul R. Eller is active.

Explore More

Publication

Featured researches published by Paul R. Eller.

international parallel and distributed processing symposium | 2012

Dynamic Linear Solver Selection for Transient Simulations Using Machine Learning on Distributed Systems

Paul R. Eller; Jing-Ru C. Cheng; Robert S. Maier

Many transient simulations spend a significant portion of the overall runtime solving a linear system. A wide variety of preconditioned linear solvers have been developed to quickly and accurately solve different types of linear systems, each having options to customize the preconditioned solver for a given linear system. Transient simulations may produce significantly different linear systems as the simulation progresses due to special events occurring that make the linear systems more difficult to solve or the model moving closer to a state of equilibrium where the linear systems are easier to solve. Machine learning algorithms provide the ability to dynamically select the preconditioned linear solver for each linear system produced by a simulation. We can generate databases by computing attributes for each linear system, physical attributes for the transient simulation, computational attributes, and running times for a set of preconditioned solvers on each linear system. Machine learning algorithms can then use these databases to generate classifiers capable of dynamically selecting a preconditioned solver for each linear system given a set of attributes. This allows us to quickly and accurately compute each transient simulation using different preconditioned solvers throughout the simulation. This also provides the potential to produce speedups in comparison with using a single preconditioned solver for an entire transient simulation.

ieee international conference on high performance computing data and analytics | 2010

Acceleration of 2-D Finite Difference Time Domain Acoustic Wave Simulation Using GPUs

Paul R. Eller; Jing-Ru C. Cheng; Donald G. Albert

A Two-Dimensional Finite Difference Time Domain (2D-FDTD) simulation is used to find the source location of an acoustic wave in an urban area using a time-reversal technique. This method potentially allows soldiers on the battlefield to locate the source of an acoustic wave produced by gunfire or other sources. The simulation has been demonstrated to accurately find the location of the acoustic waves, but required hours to compute the solution. For practical use in the future, the simulation must run quickly to allow soldiers to find the location of their attacker before the attacker can leave the area, requiring us to accelerate the code to produce a solution in a reasonable amount of time. The simulation code requires many independent computations for each element of a large 2D grid. Graphics Processing Units (GPUs) perform best for highly-parallel and computationally-intense problems, making this an ideal simulation to compute using GPUs to significantly reduce the running time. GPUs also allow the solution to be obtained locally (with the soldiers) rather than at a centralized high performance computing center. This work develops a GPU version of the 2D-FDTD code and experiments with a variety of optimizations to produce an accurate solution as quickly as possible. GPU-only and CPU-GPU versions are developed, with the CPU-GPU version showing slightly better performance. Careful selection of thread block parameters is needed to load data from memory as quickly as possible. Over 11 times speedups are produced, providing progress towards a solution that can allow people on the battlefield to locate the source of gunfire and other projectiles in close to real-time.

ieee international conference on high performance computing data and analytics | 2016

Scalable non-blocking preconditioned conjugate gradient methods

Paul R. Eller; William Gropp

The preconditioned conjugate gradient method (PCG) is a popular method for solving linear systems at scale. PCG requires frequent blocking allreduce collective operations that can limit performance at scale. We investigate PCG variations designed to reduce communication costs by decreasing the number of allreduces and by overlapping communication with computation using a non-blocking allreduce. These variations include two methods we have developed, non-blocking PCG and 2-step pipelined PCG, and pipelined PCG from Ghysels and Vanroose. Performance modeling for communication and computation costs shows the expected performance of these methods. Weak and strong scaling experiments on up to 128k cores show that scalable PCG methods can outperform standard PCG at scale. We observe that the fastest method varies depending on the work per core, suggesting we need a suite of scalable solvers to obtain the best performance. Experiments with multiple preconditioners and linear systems show the robustness of these methods.

international conference on conceptual structures | 2012

Dynamic Linear Solver Selection for Transient Simulations Using Multi-label Classifiers

Paul R. Eller; Jing-Ru C. Cheng; Robert S. Maier

Abstract Many transient simulations spend a significant portion of the overall runtime solving a linear system. A wide variety of preconditioned linear solvers have been developed to quickly and accurately solve different types of linear systems, each having options to customize the preconditioned solver for a given linear system. Transient simulations may produce significantly different linear systems as the simulation progresses due to special events occurring that make the linear systems more difficult to solve or move the model closer to a state of equilibrium with easier to solve linear systems. Machine learning algorithms provide the ability to dynamically select the preconditioned linear solver for each linear system produced by a simulation. We test both single-label and multi-label classifiers, demonstrating that multi-label classifiers achieve the best performance due to associating multiplefast linear solvers with each tested linear system. For more difficult simulations, these classifiers produce significant speedups, while for less diffcult simulations these classifiers achieve performance similar to thefastest single preconditioned linear solvers. We test classifiers generated using limited attribute sets, demonstrating that we can minimize overhead while still obtaining fast, accurate simulations.

spring simulation multiconference | 2010

Development and acceleration of parallel chemical transport models

Paul R. Eller; Kumaresh Singh; Adrian Sandu

Improving chemical transport models for atmospheric simulations relies on future developments of mathematical methods and parallelization methods. Better mathematical methods allow simulations to more accurately model realistic processes and/or to run in a shorter amount of time. Parallelization methods allow simulations to run in less time, allowing scientists to use more accurate or more detailed simulations (higher resolution grids, smaller time steps). The STEM chemical transport model provides a large scale end-to-end application to experiment with running chemical integration methods and transport methods on GPUs. GPUs provide high computational power at a fairly cheap cost. The CUDA programming environment simplifies the GPU development process by providing access to powerful functions to execute parallel code. This work demonstrates the acceleration of a large scale end-to-end application on GPUs showing significant speedups. This is achieved by implementing all relevant kernels on the GPU using CUDA. Nevertheless, further improvements to GPUs are needed to allow these applications to fully exploit the power of GPUs.

international conference on conceptual structures | 2010

Improving parallel performance of large-scale watershed simulations

Paul R. Eller; Jing-Ru C. Cheng; Hung V. Nguyen; Robert S. Maier

Abstract A comprehensive, physics-based watershed model with multispatial domains and multitemporal scales has been developed and used. This paper discusses interfacing the watershed model with PETSc and evaluating the model performance for a variety of PETSc preconditioners. Both wall-clock time and scalability are compared based on performance on the Cray XT4 machine, along with tests to verify that all solutions are producing accurate results. The findings conclude that the PETSc Conjugate Gradient solver and preconditioners outperform the simple Conjugate Gradient solver and Jacobi preconditioner originally used by the watershed model. Tests show that the HypreBoomeramg preconditioner provides the most significant speedup for the watershed model.

international conference on computational science | 2009

Improving GEOS-Chem Model Tropospheric Ozone through Assimilation of Pseudo Tropospheric Emission Spectrometer Profile Retrievals

Kumaresh Singh; Paul R. Eller; Adrian Sandu; Kevin West Bowman; Dylan B. A. Jones; Meemong Lee

4D-variational or adjoint-based data assimilation provides a powerful means for integrating observations with models to estimate an optimal atmospheric state and to characterize the sensitivity of that state to the processes controlling it.In this paper we present the improvement of 2006 summer time distribution of global tropospheric ozone through assimilation of pseudo profile retrievals from the Tropospheric Emission Spectrometer (TES) into the GEOS-Chem global chemical transport model based on a recently-developed adjoint model of GEOS-Chem v7. We are the first to construct an adjoint of the linearized ozone parameterization (linoz) scheme that can be of very high importance in quantifying the amount of tropospheric ozone due to upper boundary exchanges. Tests conducted at various geographical levels show that the mismatch between adjoint values and their finite difference approximations could be up to 87% if linoz module adjoint is not used, leading to a divergence in the quasi-Newton approximation algorithm (L-BFGS) during data assimilation. We also present performance improvements in this adjoint model in terms of memory usage and speed. With the parallelization of each science process adjoint subroutine and sub-optimal combination of checkpoints and recalculations, the improved adjoint model is as efficient as the forward GEOS-Chem model.

Geoscientific Model Development | 2009