Dennis C. Jespersen | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Dennis C. Jespersen is active.

Explore More

Publication

Featured researches published by Dennis C. Jespersen.

parallel computing | 2011

High performance computing using MPI and OpenMP on multi-core parallel systems

Haoqiang Jin; Dennis C. Jespersen; Piyush Mehrotra; Rupak Biswas; Lei Huang; Barbara M. Chapman

The rapidly increasing number of cores in modern microprocessors is pushing the current high performance computing (HPC) systems into the petascale and exascale era. The hybrid nature of these systems - distributed memory across nodes and shared memory with non-uniform memory access within each node - poses a challenge to application developers. In this paper, we study a hybrid approach to programming such systems - a combination of two traditional programming models, MPI and OpenMP. We present the performance of standard benchmarks from the multi-zone NAS Parallel Benchmarks and two full applications using this approach on several multi-core based systems including an SGI Altix 4700, an IBM p575+ and an SGI Altix ICE 8200EX. We also present new data locality extensions to OpenMP to better match the hierarchical memory structure of multi-core architectures.

29th Aerospace Sciences Meeting | 1991

Numerical Simulation of Flow Past a Tapered Cylinder

Dennis C. Jespersen; Creon Levit

The unsteady three-dimensional low Reynolds number flow past a tapered cylinder is computed. The spanwise variation in natural shedding frequency results in interesting three-dimensional flow phenomena. The computed hot-wire and spectral data are very similar to experimental results. The computation was done on the Connection Machine, a massively parallel computer; highlights of the capabilities of the Connection Machine for computation and visualization of three-dimensional unsteady flow fields are shown.

international parallel and distributed processing symposium | 2010

Performance impact of resource contention in multicore systems

Robert Hood; Haoqiang Jin; Piyush Mehrotra; Johnny Chang; M. Jahed Djomehri; Sharad Gavali; Dennis C. Jespersen; Kenichi Taylor; Rupak Biswas

Resource sharing in commodity multicore processors can have a significant impact on the performance of production applications. In this paper we use a differential performance analysis methodology to quantify the costs of contention for resources in the memory hierarchy of several multicore processors used in high-end computers. In particular, by comparing runs that bind MPI processes to cores in different patterns, we can isolate the effects of resource sharing. We use this methodology to measure how such sharing affects the performance of four applications of interest to NASA — OVERFLOW, MITgcm, Cart3D, and NCC. We also use a subset of the HPCC benchmarks and hardware counter data to help interpret and validate our findings. We conduct our study on high-end computing platforms that use four different quad-core microprocessors — Intel Clovertown, Intel Harpertown, AMD Barcelona, and Intel Nehalem-EP. The results help further our understanding of the requirements these codes place on their production environments and also of each computers ability to deliver performance.

ieee international conference on high performance computing data and analytics | 2008

Scientific application-based performance comparison of SGI Altix 4700, IBM POWER5+, and SGI ICE 8200 supercomputers

Subhash Saini; Dale Talcott; Dennis C. Jespersen; M. Jahed Djomehri; Haoqiang Jin; Rupak Biswas

The suitability of next-generation high-performance computing systems for petascale simulations will depend on various performance factors attributable to processor, memory, local and global network, and input/output characteristics. In this paper, we evaluate performance of new dual-core SGI Altix 4700, quad-core SGI Altix ICE 8200, and dual-core IBM POWER5+ systems. To measure performance, we used micro-benchmarks from High Performance Computing Challenge (HPCC), NAS Parallel Benchmarks (NPB), and four real-world applications---three from computational fluid dynamics (CFD) and one from climate modeling. We used the micro-benchmarks to develop a controlled understanding of individual system components, then analyzed and interpreted performance of the NPBs and applications. We also explored the hybrid programming model (MPI+OpenMP) using multi-zone NPBs and the CFD application OVERFLOW-2. Achievable application performance is compared across the systems. For the ICE platform, we also investigated the effect of memory bandwidth on performance by testing 1, 2, 4, and 8 cores per node.

Scientific Programming | 2010

Acceleration of a CFD code with a GPU

Dennis C. Jespersen

The Computational Fluid Dynamics code OVERFLOW includes as one of its solver options an algorithm which is a fairly small piece of code but which accounts for a significant portion of the total computational time. This paper studies some of the issues in accelerating this piece of code by using a Graphics Processing Unit (GPU). The algorithm needs to be modified to be suitable for a GPU and attention needs to be given to 64-bit and 32-bit arithmetic. Interestingly, the work done for the GPU produced ideas for accelerating the CPU code and led to significant speedup on the CPU.

Applied Mathematics and Computation | 1983

Design and implementation of a multigrid code for the Euler equations

Dennis C. Jespersen

The steady-state equations of inviscid fluid flow, the Euler equations, are a nonlinear nonelliptic system of equations admitting solutions with dis continuities (for example, shocks). The efficient numerical solution of these equations poses a strenuous challenge to multigrid methods. A multigrid code has been developed for the numerical solution of the Euler equations. In this paper some of the factors that had to be taken into account in the design and development of the code are reviewed. These factors include the importance of choosing an appropriate difference scheme, the usefulness of local mode analysis as a design tool, and the crucial question of how to treat the nonlinearity. Sample calculations of transonic flow about airfoils will be presented. No claim is made that the particular algorithm presented is optimal.

conference on high performance computing (supercomputing) | 1997

A multi-level parallelization concept for high-fidelity multi-block solvers

Ferhat F. Hatay; Dennis C. Jespersen; Guru P. Guruswamy; Yehia M. Rizk; Chansup Byun; Ken Gee

The integration of high-fidelity Computational Fluid Dynamics (CFD) analysis tools with the industrial design process benefits greatly from the robust implementations that are transportable across a wide range of computer architectures. In the present work, a hybrid domain-decomposition and parallelization concept was developed and implemented into the widely-used NASA multi-block Computational Fluid Dynamics (CFD) solvers employed in ENSAERO and OVERFLOW advanced flow analysis packages. These advanced engineering and scientific analysis packages include more than 300,000 lines of code written in FORTRAN 77 language in more than 1300 individual subprograms. The new parallel solver concept, PENS (Parallel Euler Navier-Stokes Solver), employs both fine and coarse granularity with data partitioning as well as data coalescing to obtain the desired load-balance characteristics on the available computer platforms for these legacy packages. This multi-level parallelism implementation itself introduces no changes to the numerical results, hence the original fidelity of the packages are identically preserved. The present implementation uses the Message Passing Interface (MPI) library for interprocessor message passing and memory accessing. By choosing an appropriate combination of the available partitioning and coalescing possibilities only during the execution stage, the PENS solver is used on different computer architectures from shared-memory to distributed-memory platforms with varying degrees of parallelism. Improvements in computational load-balance and speeds are extremely crucial on the realistic problems in the design of aerospace vehicles. The PENS implementation on the IBM SP2 distributed memory environment at the NASA Ames Research Center obtains 85 percent scalable parallel performance using fine-grain partitioning of single-block CFD domains using up to 128 wide computational nodes. Multi-block CFD simulations of complete aircraft geometries achieve 85 percent perfect load-balanced executions using data coalescing and the two levels of parallelism. SGI PowerChallenge, SGI Onyx2, and Cray T3E are the other platforms where the robustness, performance behavior, and the parallel scalability of the implementation are tested and fine-tuned for actual production run environments.

Computers & Structures | 1988

EXPLICIT AND IMPLICIT SOLUTION OF THE NAVIER–STOKES EQUATIONS ON A MASSIVELY PARALLEL COMPUTER

Creon Levit; Dennis C. Jespersen

Abstract We describe the design, implementation and performance of a two-dimensional time-accurate Navier-Stokes solver for the Connection Machine (tin) model CM2. The program uses a single processor for each grid point. Two different time-stepping methods have so far been implemented: an explicit third-order Runge-Kutta method and an implicit approximation-factorization method. The entire flow solver is written in starlisp, a set of parallel extensions to common-lisp, developed by Thinking Machines Corporation. Our code is based on ARC2D, and we are attempting to emulate its functionality. Thus, we can check our Connection Machine results against those of a mature, well-vectorized Cray 2 program, both for correctness and performance. We find the code to be correct and the performance in some cases to be up to several times that of the Cray 2.

ieee international conference on high performance computing data and analytics | 2013

An early performance evaluation of many integrated core architecture based SGI rackable computing system

Subhash Saini; Haoqiang Jin; Dennis C. Jespersen; Huiyu Feng; M. Jahed Djomehri; William Arasin; Robert Hood; Piyush Mehrotra; Rupak Biswas

Intel recently introduced the Xeon Phi coprocessor based on the Many Integrated Core architecture featuring 60 cores with a peak performance of 1.0 Tflop/s. NASA has deployed a 128-node SGI Rackable system where each node has two Intel Xeon E2670 8-core Sandy Bridge processors along with two Xeon Phi 5110P coprocessors. We have conducted an early performance evaluation of the Xeon Phi. We used microbenchmarks to measure the latency and bandwidth of memory and interconnect, I/O rates, and the performance of OpenMP directives and MPI functions. We also used OpenMP and MPI versions of the NAS Parallel Benchmarks along with two production CFD applications to test four programming modes: offload, processor native, coprocessor native and symmetric (processor plus coprocessor). In this paper we present preliminary results based on our performance evaluation of various aspects of a Phi-based system.

Applied Numerical Mathematics | 1985

Enthalpy damping for the steady Euler equations

Dennis C. Jespersen

Abstract For inviscid steady flow problems where the enthalpy is constant at steady state, it has been proposed by Jameson, Schmidt, and Turkel to use the difference between the local enthalpy and the steady state enthalpy as a driving term to accelerate convergence of iterative schemes. This idea is analyzed here, both on the level of the partial differential equation and on the level of a particular finite difference scheme. It is shown that for the two-dimensional unsteady Euler equations, a hyperbolic system with eigenvalues on the imaginary axis, there is no enthalpy damping strategy which can move all the eigenvalues into the open left half plane. For the numerical scheme, however, the analysis shows and examples verify that enthalpy damping can be effective in accelerating convergence to steady state.

Explore More