Henri Calandra | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Henri Calandra is active.

Explore More

Publication

Featured researches published by Henri Calandra.

ieee international conference on high performance computing data and analytics | 2012

A Coarray Fortran Implementation to Support Data-Intensive Application Development

Deepak Eachempati; Alan Richardson; Terrence Liao; Henri Calandra; Barbara M. Chapman

In this paper, we describe our experiences in implementing and applying Coarray Fortran (CAF) for the development of data-intensive applications in the domain of Oil and Gas exploration. The successful porting of reverse time migration (RTM), a data-intensive algorithm and one of the largest uses of computational resources in seismic exploration, is described, and results are presented demonstrating that the CAF implementation provides comparable performance to an equivalent MPI version. We then discuss further language extensions for supporting scalable parallel I/O operating on the massive data sets that are typical of applications used in seismic exploration.

ieee international conference on high performance computing data and analytics | 2012

Experiences with OpenMP, PGI, HMPP and OpenACC Directives on ISO/TTI Kernels

Sayan Ghosh; Terrence Liao; Henri Calandra; Barbara M. Chapman

GPUs are slowly becoming ubiquitous devices in High Performance Computing, as their capabilities to enhance the performance per watt of compute intensive algorithms as compared to multicore CPUs have been identified. The primary shortcoming of a GPU is usability, since vendor specific APIs are quite different from existing programming languages, and it requires a substantial knowledge of the device and programming interface to optimize applications. Hence, lately a growing number of higher level programming models are targeting GPUs to alleviate this problem. The ultimate goal for a high-level model is to expose an easy-to-use interface for the user to offload compute intensive portions of code (kernels) to the GPU, and tune the code according to the target accelerator to maximize overall performance with a reduced development effort. In this paper, we share our experiences of three of the notable high-level directive based GPU programming models - PGI, CAPS and OpenACC (from CAPS and PGI) on an Nvidia M2090 GPU. We analyze their performance and programmability against Isotropic (ISO)/Tilted Transversely Isotropic (TTI) finite difference kernels, which are primary components in the Reverse Time Migration (RTM) application used by oil and gas exploration for seismic imaging of the sub-surface. When ported to a single GPU using the mentioned directives, we observe an average 1.5-1.8x improvement in performance for both ISO and TTI kernels, when compared with optimized multi-threaded CPU implementations using OpenMP.

programming models and applications for multicores and manycores | 2015

GPU technology applied to reverse time migration and seismic modeling via OpenACC

Ahmad Qawasmeh; Barbara M. Chapman; Maxime R. Hugues; Henri Calandra

GPU computing offers tremendous potential to accelerate complex scientific applications and is becoming a leading force in speeding up seismic imaging and velocity analysis techniques. Developing portable code is a challenge that can be overcome using emerging high-level directive-based programming model such as OpenACC. In this paper, we develop OpenACC implementations for both seismic modeling and Reverse Time Migration (RTM) algorithms that solve the isotropic, acoustic, and elastic wave equations. We employ OpenACC to take advantage of the computational power of two Nvidia GPU cards: 1) M2090 and 2) K40, residing in IBM and CRAY XC30 clusters respectively. Although we implement a hybrid OpenACC-MPI approach to parallelize seismic modeling and RTM on multiple GPUs, in this paper, we focus on developing mapping techniques to exploit potentials of one GPU. We observe an incremental improvement in performance while exploring different optimization techniques. Adequate code restructuring to tap GPUs potential seems critical. Depending on the intensity of computations, different propagators exhibit different speedup behaviors. A performance enhancement of ~ 10x was obtained, when the acoustic model was ported to a single GPU, compared with a 1.3x speedup obtained using the isotropic model.

Proceedings of the First Workshop on Accelerator Programming using Directives | 2014

Accelerating Kirchhoff migration on GPU using directives

Rengan Xu; Maxime R. Hugues; Henri Calandra; Sunita Chandrasekaran; Barbara M. Chapman

Accelerators offer the potential to significantly improve the performance of scientific applications when offloading compute intensive portions of programs to the accelerators. However, effectively tapping their full potential is difficult owing to the programmability challenges faced by the users when mapping computation algorithms to the massively parallel architectures such as GPUs.Directive-based programming models offer programmers an option to rapidly create prototype applications by annotating region of code for offloading with hints to the compiler. This is critical to improve the productivity in the production code. In this paper, we study the effectiveness of a high-level directivebased programming model, OpenACC, for parallelizing a seismic migration application called Kirchhoff Migration on GPU architecture. Kirchhoff Migration is a real-world production code in the Oil & Gas industry. Because of its compute intensive property, we focus on the computation part and explore different mechanisms to effectively harness GPUs computation capabilities and memory hierarchy. We also analyze different loop transformation techniques in different OpenACC compilers and compare their performance differences. Compared toone socket (10 CPU cores) on the experimental platform, one GPU achieved a maximum speedup of 20.54x and 6.72x for interpolation and extrapolation kernel functions.

International Journal of High Performance Computing Applications | 2017

Performance portability in reverse time migration and seismic modelling via OpenACC

Ahmad Qawasmeh; Maxime R. Hugues; Henri Calandra; Barbara M. Chapman

Heterogeneity among the computational resources within a single machine has significantly increased in high performance computing to exploit the tremendous potential of graphics processing units (GPUs). Portability in terms of code development and performance has been a challenge due to major differences between GPU programming and memory models from one side and conventional central processing units (CPUs) from another side. Performance characteristics of compilers and processors also vary between machines. Emerging high-level directive-based programming models such as OpenACC has been proposed to target this challenge. In this work, we develop OpenACC implementations for both seismic modelling and reverse time migration algorithms that solve the isotropic, acoustic, and elastic wave equations. We employ OpenACC to take advantage of the computational power of two Nvidia GPU cards: (1) M2090 and (2) K40, residing in IBM and CRAY XC30 clusters respectively. We also explore the main aspects of hybridization seismic modelling and reverse time migration by implementing an Message Passing Interface (MPI)+OpenACC approach. We expose various mapping techniques to develop a portable code that maximizes performance regardless of compiler or platform. Depending on the intensity of the computations, different propagators exhibited different speedup behaviours against a full socket CPU MPI implementation. A performance enhancement of ~10× was obtained, when the acoustic model was ported to a single GPU, compared with a 1.7× speedup obtained using the isotropic model. Our MPI+OpenACC implementation of reverse time migration and seismic modelling shows promising scaling when multiple GPUs were used.

Computing | 2014

Performance of CPU/GPU compiler directives on ISO/TTI kernels

Sayan Ghosh; Terrence Liao; Henri Calandra; Barbara M. Chapman

GPUs are slowly becoming ubiquitous devices in High Performance Computing, as their capabilities to enhance the performance per watt of compute intensive algorithms as compared to multicore CPUs have been identified. The primary shortcoming of a GPU is usability, since vendor specific APIs are quite different from existing programming languages, and it requires a substantial knowledge of the device and programming interface to optimize applications. Hence, lately a growing number of higher level programming models are targeting GPUs to alleviate this problem. The ultimate goal for a high-level model is to expose an easy-to-use interface for the user to offload compute intensive portions of code (kernels) to the GPU, and tune the code according to the target accelerator to maximize overall performance with a reduced development effort. In this paper, we share our experiences of three of the notable high-level directive based GPU programming models—PGI, CAPS and OpenACC (from CAPS and PGI) on an Nvidia M2090 GPU. We analyze their performance and programmability against Isotropic (ISO)/Tilted Transversely Isotropic (TTI) finite difference kernels, which are primary components in the Reverse Time Migration (RTM) application used by oil and gas exploration for seismic imaging of the sub-surface. When ported to a single GPU using the mentioned directives, we observe an average 1.5–1.8x improvement in performance for both ISO and TTI kernels, when compared with optimized multi-threaded CPU implementations using OpenMP.

international parallel and distributed processing symposium | 2017

One-Way Wave Equation Migration at Scale on GPUs Using Directive Based Programming

Kshitij Mehta; Maxime R. Hugues; Oscar R. Hernandez; David E. Bernholdt; Henri Calandra

One-Way Wave Equation Migration (OWEM) is a depth migration algorithm used for seismic imaging. A parallel version of this algorithm is widely implemented using MPI. Heterogenous architectures that use GPUs have become popular in the Top 500 because of their performance/power ratio. In this paper, we discuss the methodology and code transformations used to port OWEM to GPUs using OpenACC, along with the code changes needed for scaling the application up to 18,400 GPUs (more than 98%) of the Titan leadership class supercomputer at Oak Ridget National Laboratory. For the individual OpenACC kernels, we achieved an average of 3X speedup on a test dataset using one GPU as compared with an 8-core Intel Sandy Bridge CPU. The application was then run at large scale on the Titan supercomputer achieving a peak of 1.2 petaflops using an average of 5.5 megawatts. After porting the application to GPUs, we discuss how we dealt with other challenges of running at scale such as the application becoming more I/O bound and prone to silent errors. We believe this work will serve as valuable proof that directive-based programming models are a viable option for scaling HPC applications to heterogenous architectures.

International Journal of High Performance Computing Applications | 2017

Leveraging the accelerated processing units for seismic imaging: A performance and power efficiency comparison against CPUs and GPUs

Issam Said; Pierre Fortin; Jean Luc Lamotte; Henri Calandra

Oil and gas companies rely on high performance computing to process seismic imaging algorithms such as reverse time migration. Graphics processing units are used to accelerate reverse time migration, but these deployments suffer from limitations such as the lack of high graphics processing unit memory capacity, frequent CPU-GPU communications that may be bottlenecked by the PCI bus transfer rate, and high power consumptions. Recently, AMD has launched the Accelerated Processing Unit (APU): a processor that merges a CPU and a graphics processing unit on the same die featuring a unified CPU-GPU memory. In this paper, we explore how efficiently may the APU be applicable to reverse time migration. Using OpenCL (along with MPI and OpenMP), a CPU/APU/GPU comparative study is conducted on a single node for the 3D acoustic reverse time migration, and then extended on up to 16 nodes. We show the relevance of overlapping the I/O and MPI communications with the computations for the APU and graphics processing unit clusters, that performance results of APUs range between those of CPUs and those of graphics processing units, and that the APU power efficiency is greater than or equal to the graphics processing unit one.

international conference on conceptual structures | 2011

ASIODS - An Asynchronous and Smart I/O Delegation System

Maxime R. Hugues; Michael Moretti; Serge G. Petiton; Henri Calandra

In high performance computing, many large scientific and engineering problems are solved on a supercomputer which is the gathering of two specialized entities, one dedicated to computations and another one to I/O. Many applications to settle problems have deterministic behaviors in computations and I/O. This knowledge may be used to load data in advance or delegate data writing on dedicated nodes. Thereby, it could be interesting to use the specialized parts of the supercomputer and this knowledge in order to have a better cache management by uncoupling computations and I/O. This has led to the design and evaluation of a first prototype of ASIODS. This paper presents the architecture of our approach and the results obtained showing the concept capabilities. We demonstrate that the approach reduces the execution time by avoiding I/O access penalties.

Archive | 2014

DOE Advanced Scientific Advisory Committee (ASCAC): Workforce Subcommittee Letter

Barbara M. Chapman; Henri Calandra; Silvia Crivelli; Jack Dongarra; Jeffrey Hittinger; Scott A. Lathrop; Vivek Sarkar; Eric Stahlberg; Jeffrey S. Vetter; Dean Williams

Explore More