Dinesh K. Kaushik | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Dinesh K. Kaushik is active.

Explore More

Publication

Featured researches published by Dinesh K. Kaushik.

parallel computing | 2001

High-performacne parallel implicit CFD

William Gropp; Dinesh K. Kaushik; David E. Keyes; Barry F. Smith

Fluid dynamical simulations based on finite discretizations on (quasi-)static grids scale well in parallel, but execute at a disappointing percentage of per-processor peak floating point operation rates without special attention to layout and access ordering of data. We document both claims from our experience with an unstructured grid CFD code that is typical of the state of the practice at NASA. These basic performance characteristics of PDE-based codes can be understood with surprisingly simple models, for which we quote earlier work, presenting primarily experimental results. The performance models and experimental results motivate algorithmic and software practices that lead to improvements in both parallel scalability and per node performance. This snapshot of ongoing work updates our 1999 Bell Prize-winning simulation on ASCI computers.

ieee international conference on high performance computing data and analytics | 2013

Multiphysics simulations: Challenges and opportunities

David E. Keyes; Lois Curfman McInnes; Carol S. Woodward; William Gropp; Eric Myra; Michael Pernice; John B. Bell; Jed Brown; Alain Clo; Jeffrey M. Connors; Emil M. Constantinescu; Donald Estep; Kate Evans; Charbel Farhat; Ammar Hakim; Glenn E. Hammond; Glen A. Hansen; Judith C. Hill; Tobin Isaac; Kirk E. Jordan; Dinesh K. Kaushik; Efthimios Kaxiras; Alice Koniges; Kihwan Lee; Aaron Lott; Qiming Lu; John Harold Magerlein; Reed M. Maxwell; Michael McCourt; Miriam Mehl

We consider multiphysics applications from algorithmic and architectural perspectives, where “algorithmic” includes both mathematical analysis and computational complexity, and “architectural” includes both software and hardware environments. Many diverse multiphysics applications can be reduced, en route to their computational simulation, to a common algebraic coupling paradigm. Mathematical analysis of multiphysics coupling in this form is not always practical for realistic applications, but model problems representative of applications discussed herein can provide insight. A variety of software frameworks for multiphysics applications have been constructed and refined within disciplinary communities and executed on leading-edge computer systems. We examine several of these, expose some commonalities among them, and attempt to extrapolate best practices to future systems. From our study, we summarize challenges and forecast opportunities.

Parallel Computational Fluid Dynamics 1999#R##N#Towards Teraflops, Optimization and Novel Formulations | 2000

Towards Realistic Performance Bounds for Implicit CFD Codes

William Gropp; Dinesh K. Kaushik; David E. Keyes; Barry F. Smith

Traditionally, numerical analysts have evaluated the performance of algorithms by counting the number of floating-point operations. On the algorithmic side, tremendous strides have been made; many algorithms now require only a few floating-point operations per mesh point. However, on the hardware side, memory system performance is improving at a rate that is much slower than that of processor performance. The result is a mismatch in capabilities: algorithm design has minimized the work per data item, but hardware design depends on executing an increasing large number of operations per data item. The importance of memory bandwidth to the overall performance is suggested by the available results. These show that the STREAM results are much better indicator of performance than the peak numbers. The chapter illustrates the performance limitations caused by insufficient available memory bandwidth with a discussion of sparse matrix-vector multiply, a critical operation in many iterative methods used in implicit CFD codes. It also focuses on the per-processor performance of compute nodes used in parallel computers. Experiments have shown that PETSc-FUN3D has good scalability. In fact, since good per-processor performance reduces the fraction of time spent computing as opposed to communication, achieving the best per-processor performance is a critical prerequisite to demonstrating uninflated parallel performance.

conference on high performance computing (supercomputing) | 2000

Performance Modeling and Tuning of an Unstructured Mesh CFD Application

William Gropp; Dinesh K. Kaushik; David E. Keyes; Barry F. Smith

This paper describes performance tuning experiences with a three-dimensional unstructured grid Euler flow code from NASA, which we have reimplemented in the PETSc framework and ported to several large-scale machines, including the ASCI Red and Blue Pacific machines, the SGI Origin, the Cray T3E, and Beowulf clusters. The code achieves a respectable level of performance for sparse problems, typical of scientific and engineering codes based on partial differential equations, and scales well up to thousands of processors. Since the gap between CPU speed and memory access rate is widening, the code is analyzed from a memory-centric perspective (in contrast to traditional flop-orientation) to understand its sequential and parallel performance. Performance tuning is approached on three fronts: data layouts to enhance locality of reference, algorithmic parameters, and parallel programming model. This effort was guided partly by some simple performance models developed for the sparse matrix-vector product operation.

ieee international conference on high performance computing data and analytics | 2009

Enabling high-fidelity neutron transport simulations on petascale architectures

Dinesh K. Kaushik; M. A. Smith; Allan B. Wollaber; Barry F. Smith; Andrew R. Siegel; W. S. Yang

The UNIC code is being developed as part of the DOEs Nuclear Energy Advanced Modeling and Simulation (NEAMS) program. UNIC is an unstructured, deterministic neutron transport code that allows a highly detailed description of a nuclear reactor. The primary goal of our simulation efforts is to reduce the uncertainties and biases in reactor design calculations by progressively replacing existing multilevel averaging (homogenization) techniques with more direct solution methods based on first principles. Since the neutron transport equation is seven dimensional (three in space, two in angle, one in energy, and one in time), these simulations are among the most memory and computationally intensive in all of computational science. In order to model the complex physics of a reactor core, billions of spatial elements, hundreds of angles, and thousands of energy groups are necessary, leading to problem sizes with petascale degrees of freedom. Therefore, these calculations exhaust memory resources on current and even next-generation architectures. In this paper, we present UNIC simulation results for two important representative problems in reactor design and analysis---PHENIX and ZPR-6. In each case, UNIC shows good weak scalability on up to 163,840 cores of Blue Gene/P (Argonne) and 122,800 cores of XT5 (Oak Ridge). While our current per processor performance is less than ideal, we demonstrate a clear ability to effectively utilize the leadership computing platforms. Over the coming months, we aim to improve the per processor performance while maintaining the high parallel efficiency by employing better algorithms such as spatial p- and h-multigrid preconditioners, optimized matrix-tensor operations, and weighted partitioning for better load balancing. Combining these additional algorithmic improvements with the availability of larger parallel machines should allow us to realize our long-term goal of explicit geometry coupled multiphysics reactor simulations. In the long run, these high-fidelity simulations will be able to replace expensive mockup experiments and reduce the uncertainty in crucial reactor design and operational parameters.

Parallel Computational Fluid Dynamics 2005#R##N#Theory and Applications | 2006

Parallel adaptive solvers in compressible petsc-fun3d simulations

Sanjukta Bhowmick; Dinesh K. Kaushik; Lois Curfman McInnes; Boyana Norris; Padma Raghavan

Publisher Summary The chapter presents a polyalgorithmic technique for adaptively selecting the linear solver method to match the numeric properties of linear systems as they evolve during the course of nonlinear iterations. The approach combines more robust but more costly methods when needed in particularly challenging phases of solution, with cheaper, though less powerful, methods in other phases. The chapter demonstrates that this adaptive, polyalgorithmic approach leads to improvements in overall simulation time, is easily parallelized, and is scalable in the context of this large-scale computational fluid dynamics application. This approach reduced overall execution time by using cheaper, though less powerful, linear solvers for relatively easy linear systems and then switching to more robust but more costly methods for more difficult linear systems. The results demonstrate that adaptive solvers can be implemented easily in a multiprocessor environment and are scalable. The chapter investigates adaptive solvers in problem domains and considers more adaptive approaches, including a polynomial heuristic where the trends of the indicators can be estimated by fitting a function to known data points. The chapter also combines adaptive heuristics with high-performance component infrastructure for performance monitoring and analysis.

Nuclear Engineering and Technology | 2010

NEUTRONICS MODELING AND SIMULATION OF SHARP FOR FAST REACTOR ANALYSIS

Won Sik Yang; M. A. Smith; C. H. Lee; Allan B. Wollaber; Dinesh K. Kaushik; A. S. Mohamed

This paper presents the neutronics modeling capabilities of the fast reactor simulation system SHARP, which ANL is developing as part of the U.S. DOE’s NEAMS program. We discuss the three transport solvers (PN2ND, SN2ND, and MOCFE) implemented in the UNIC code along with the multigroup cross section generation code MC²-3. We describe the solution methods and modeling capabilities, and discuss the improvement needs for each solver, focusing on massively parallel computation. We present the performance test results against various benchmark problems and ZPR-6 and ZPPR critical experiments. We also discuss weak and strong scalability results for the SN2ND solver on the ZPR-6 critical assembly benchmarks.

international parallel and distributed processing symposium | 2001

A scientific data management system for irregular applications

Jaechun No; Rajeev Thakur; Dinesh K. Kaushik; Lori A. Freitag; Alok N. Choudhary

Many scientific applications are I/O intensive and generate large data sets, spanning hundreds or thousands of “files.” Management, storage, efficient access, and analysis of this data present an extremely challenging task. We have developed a software system, called Scientific Data Manager (SDM), that uses a combination of parallel file I/O and database support for high-performance scientific data management. SDM provides a high-level API to the user and, internally, uses a parallel file system to store real data and a database to store application-related metadata. In this paper, we describe how we designed and implemented SDM to support irregular applications. SDM can efficiently handle the reading and writing of data in an irregular mesh, as well as the distribution of index values. We describe the SDM user interface and how we have implemented it to achieve high performance. SDM makes extensive use of MPI-IO’s noncontiguous collective I/O functions. SDM also uses the concept of a history file to optimize the cost of the index distribution using the metadata stored in database. We present performance results with two irregular applications, a CFD code called FUN3D and a Rayleigh-Taylor instability code, on the SGI Origin2000 at Argonne National Laboratory.

ieee international conference on high performance computing, data, and analytics | 2008

Improving the performance of tensor matrix vector multiplication in cumulative reaction probability based quantum chemistry codes

Dinesh K. Kaushik; William Gropp; Michael Minkoff; Barry F. Smith

Cumulative reaction probability (CRP) calculations providea viable computational approach to estimate reaction rate coefficients.However, in order to give meaningful results these calculations shouldbe done in many dimensions (ten to fifteen). This makes CRP codesmemory intensive. For this reason, these codes use iterative methods tosolve the linear systems, where a good fraction of the execution timeis spent on matrix-vector multiplication. In this paper, we discuss thetensor product form of applying the system operator on a vector. Thisapproach shows much better performance and provides huge savings inmemory as compared to the explicit sparse representation of the systemmatrix.

Grid-Based Problem Solving Environments | 2007

Middleware for Dynamic Adaptation of Component Applications

Boyana Norris; Sanjukta Bhowmick; Dinesh K. Kaushik; Lois Curfman McInnes

Component- and service-based software engineering approaches have been gaining popularity in high-performance scientific computing, facilitating the creation and management of large multidisciplinary, multideveloper applications, and providing opportunities for improved performance and numerical accuracy. These software engineering approaches enable the development of middleware infrastructure for computational quality of service (CQoS), which provides performance optimizations through dynamic algorithm selection and configuration in a mostly automated fashion. The factors that affect performance are closely tied to a component’s parallel implementation, its management of parallel communication and memory, the algorithms executed, the algorithmic parameters employed, and other operational characteristics. We present the design of a component middleware CQoS architecture for automated composition and adaptation of high-performance componentor service-based applications. We describe its initial implementation and corresponding experimental results for parallel simulations involving time-dependent nonlinear partial differential equations.

Explore More