Is this you? Create Your Porfile

Barna L. Bihari

Lawrence Livermore National Laboratory

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Barna L. Bihari is active.

Explore More

Publication

Featured researches published by Barna L. Bihari.

international workshop on openmp | 2010

A case for including transactions in OpenMP

Michael Wong; Barna L. Bihari; Bronis R. de Supinski; Peng Wu; Maged M. Michael; Yan Liu; Wang Chen

Transactional Memory (TM) has received significant attention recently as a mechanism to reduce the complexity of shared memory programming. We explore the potential of TM to improve OpenMP applications. We combine a software TM (STM) system to support transactions with an OpenMP implementation to start thread teams and provide task and loop-level parallelization. We apply this system to two application scenarios that reflect realistic TM use cases. Our results with this system demonstrate that even with the relatively high overheads of STM, transactions can outperform OpenMP critical sections by 10%. Overall, our study demonstrates that extending OpenMP to include transactions would ease programming effort while allowing improved performance.

international workshop on openmp | 2012

A case for including transactions in OpenMP II: hardware transactional memory

Barna L. Bihari; Michael Wong; Amy Wang; Bronis R. de Supinski; Wang Chen

We present recent results using Hardware Transactional Memory (HTM) on IBMs Blue Gene/Q system. By showing how this latest TM system can significantly reduce the complexity of shared memory programming while retaining efficiency, we continue to make our case that the OpenMP language specification should include transactional language constructs. Furthermore, we argue for its support as an advanced abstraction to support mutable shared state, thus expanding OpenMP synchronization capabilities. Our results demonstrate how TM can be used to simplify modular parallel programming in OpenMP while maintaining parallel performance. We show performance advantages in the BUSTM ( B enchmark for U n S tructured-mesh T ransactional M emory) model using the transactional memory hardware implementation on Blue Gene/Q.

Journal of Scientific Computing | 2013

Transactional Memory for Unstructured Mesh Simulations

Barna L. Bihari

In this paper we study transactional memory (TM) as a new tool for threading codes in this new era of multi- and many-core computers. In particular, we investigate the features and study the applicability of transactional memory as an efficient and easy-to-use alternative for handling memory conflicts in unstructured mesh simulations that use shared memory. The software tool used for our preliminary analysis of this novel construct is IBM’s freely available Software Transactional Memory (STM) system. For our studies, we developed the BUSTM benchmark which is a test code with state-of-the-art unstructured-mesh bookkeeping. The numerical algorithms are simplified yet still exhibit most of the salient features of modern unstructured mesh methods. We apply STM to two frequently used algorithm types used in multi-physics codes with realistic 3-D meshes. Our computational experiments indicate a good fit between these application scenarios and the TM features.

Presented at: International Conference on Numerical Analysis and Applied Mathematics, Rhodes, Greece, Sep 19 - Sep 25, 2010 | 2010

Applicability of Transactional Memory to Modern Codes

Barna L. Bihari

In this paper we illustrate the features and study the applicability of transactional memory ™ as an efficient and easy‐to‐use alternative for handling memory conflicts in multi‐theaded physics simulations that use shared memory. The tool used for our preliminary analysis of this novel construct is IBM’s freely available Software Transactional Memory (STM) system. Instead of attempting to apply it to a production grade simulation code, we developed a much simpler test code that exhibits most of the salient features of modern unstructured mesh algorithms, but without the complicated physical models. We apply STM to two frequently used algorithms in realistic multi‐physics codes. Our computational experiments indicate a good fit between these application scenarios and the TM features.

international workshop on openmp | 2014

Towards Transactional Memory for OpenMP

Michael Wong; Eduard Ayguadé; Justin E. Gottschlich; Victor Luchangco; Bronis R. de Supinski; Barna L. Bihari

The OpenMP specification lacks a composable shared memory concurrency mechanism: the current OpenMP concurrency mechanisms, such as OMP critical, locks, or atomics, do not support composition. In this paper, we motivate the need for transactional memory (TM) in OpenMP. The chief reason is to support composition of realistic programs, but we also consider whether TM is easier to program than locks, the use case for TM, and whether a software-only TM can outperform traditional locking through a survey of recent publications. This paper advances upon previous proposals of OpenMP TM by introducing a new construct specifically to handle irrevocable actions, which is also composable. It also proposes a pure atomic transaction construct as well as the concept of transaction safety. Further, we examine how our proposed construct integrates with current OpenMP constructs.

international workshop on openmp | 2014

On the Algorithmic Aspects of Using OpenMP Synchronization Mechanisms: The Effects of Transactional Memory

Barna L. Bihari; Michael Wong; Bronis R. de Supinski; Lori A. Diachin

In this paper we analyze the effects of using different OpenMP synchronization mechanisms in iterative mesh optimization algorithms run on the IBM Blue Gene/Q system. We perform a systematic study of a threaded Laplacian mesh smoothing method on Cartesian meshes of different sizes that have been initially perturbed by a factor that is random, but within a controlled range. We consider three different run modes, two of which are OpenMP synchronization mechanisms: (hardware) transactional memory (TM), OpenMP critical, and “none”. We find that TM typically outperforms the other two modes in terms of its convergence characteristics. Because of the algorithmic simplicity and light operation count, the raw runtime performance was not our focus in this work; however, we present some results on TM scaling. We also show the TM rollback and conflict probabilities, and conclude that mesh optimization codes are good candidates for using TM when the more general “time-to-convergence” criterion is considered.

international workshop on openmp | 2015

On the Algorithmic Aspects of Using OpenMP Synchronization Mechanisms II: User-Guided Speculative Locks

Barna L. Bihari; Hansang Bae; James Cownie; Michael Klemm; Christian Terboven; Lori A. Diachin

In this paper we continue our investigations started in [8] into the effects of using different synchronization mechanisms in OpenMP-threaded iterative mesh optimization algorithms. We port our test code to the Intel® Xeon® processor (former codename “Haswell”) by employing a user-guided locking API for OpenMP [4] that provides a general and unified user interface and runtime framework. Since the Intel® Transactional Synchronization Extensions (TSX) provide two different options for speculation — Hardware Lock Elision (HLE) and Restricted Transactional Memory (RTM) — we compare a total of four different run modes: (i) HLE, (ii) RTM, (iii) OpenMP critical, and (iv) “unsynchronized”. As we did in [8], we find that either speculative execution option always outperforms the other two modes in terms of their convergence characteristics. Even with their higher overhead, the TSX options are very competitive when it comes to runtime performance measured with the “time-to-convergence” criterion introduced in [8].

international workshop on openmp | 2016

Transactional Memory for Algebraic Multigrid Smoothers

Barna L. Bihari; Ulrike Meier Yang; Michael Wong; Bronis R. de Supinski

This paper extends our early investigations in which we compared transactional memory to traditional OpenMP synchronization mechanisms [7, 8]. We study similar issues for algebraic multigrid (AMG) smoothers in hypre [16], a mature and widely used production-quality linear solver library. We compare the transactional version of the Gauss-Seidel AMG smoother to an omp critical version and the default hybrid Gauss-Seidel smoother, as well as the \(l_1\) variations of both Gauss-Seidel and Jacobi smoothers. Importantly, we present results for real-life 2-D and 3-D problems discretized by the finite element method that demonstrate the TM option can outperform the existing methods, often by orders of magnitude, in terms of the recently introduced performance measure of run time per quality.

SIAM Journal on Numerical Analysis | 2009

A Linear Algebraic Analysis of Diffusion Synthetic Acceleration for the Boltzmann Transport Equation II: The Simple Corner Balance Method

Barna L. Bihari; Peter N. Brown

In this paper we apply the development and linear algebraic analysis of the diffusion synthetic acceleration method presented in [S. F. Ashby, P. N. Brown, M. R. Dorr, and A. C. Hindmarsh, SIAM J. Numer. Anal., 32 (1995), pp. 128-178] to a different spatial discretization. Our model equation is the monoenergetic, steady-state, linear Boltzmann transport equation in slab geometry. The discretization consists of a discrete ordinates collocation in angle and the simple corner balance method in space. By expressing diffusion synthetic acceleration in this formalism, asymptotic results are obtained that prove the effectiveness of the associated preconditioner in various limiting cases, including the asymptotic diffusion limit. These results hold for problems with nonconstant coefficients and nonuniform spatial zoning posed on finite domains with an incident flux at the boundaries. Numerical results confirm the theoretical estimates.

Archive | 2006