Swaroop Pophale | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Swaroop Pophale is active.

Explore More

Publication

Featured researches published by Swaroop Pophale.

Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model | 2010

Introducing OpenSHMEM: SHMEM for the PGAS community

Barbara M. Chapman; Tony Curtis; Swaroop Pophale; Stephen W. Poole; Jeffery A. Kuehn; Chuck Koelbel; Lauren Smith

The OpenSHMEM community would like to announce a new effort to standardize SHMEM, a communications library that uses one-sided communication and utilizes a partitioned global address space. OpenSHMEM is an effort to bring together a variety of SHMEM and SHMEM-like implementations into an open standard using a community-driven model. By creating an open-source specification and reference implementation of OpenSHMEM, there will be a wider availability of a PGAS library model on current and future architectures. In addition, the availability of an OpenSHMEM model will enable the development of performance and validation tools. We propose an OpenSHMEM specification to help tie together a number of divergent implementations of SHMEM that are currently available. To support an existing and growing user community, we will develop the OpenSHMEM web presence, including a community wiki and training material, and face-to-face interaction, including workshops and conference participation.

international conference on supercomputing | 2011

SRC: OpenSHMEM library development

Swaroop Pophale

OpenSHMEM is a PGAS programming library implementing an RMA-based point-to-point and collective communication paradigm which decouples data motion from synchronization. This results in a more scalable programming model than more common two-sided paradigms such as MPI. The OpenSHMEM project arose in an effort to standardize among several implementations of the decade-old SHMEM API, which exhibited subtle differences in the API and underlying semantics, inhibiting portability between implementations. In collaboration with Oak Ridge National Laboratory, the University of Houston is preparing an API specification and a portable, scalable, observable OpenSHMEM reference implementation.

OpenSHMEM 2014 Proceedings of the First Workshop on OpenSHMEM and Related Technologies. Experiences, Implementations, and Tools - Volume 8356 | 2014

Hybrid Programming Using OpenSHMEM and OpenACC

Matthew B. Baker; Swaroop Pophale; Jean-Charles Vasnier; Haoqiang Jin; Oscar R. Hernandez

With high performance systems exploiting multicore and accelerator-based architectures on a distributed shared memory system, heterogenous hybrid programming models are the natural choice to exploit all the hardware made available on these systems. Previous efforts looking into hybrid models have primarily focused on using OpenMP directives (for shared memory programming) with MPI (for inter-node programming on a cluster), using OpenMP to spawn threads on a node and communication libraries like MPI to communicate across nodes. As accelerators get added into the mix, and there is better hardware support for PGAS languages/APIs, this means that new and unexplored heterogenous hybrid models will be needed to effectively leverage the new hardware. In this paper we explore the use of OpenACC directives to program GPUs and the use of OpenSHMEM, a PGAS library for onesided communication between nodes. We use the NAS-BT Multi-zone benchmark that was converted to use the OpenSHMEM library API for network communication between nodes and OpenACC to exploit accelerators that are present within a node. We evaluate the performance of the benchmark and discuss our experiences during the development of the OpenSHMEM+OpenACC hybrid program.

Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models | 2014

Extending the OpenSHMEM Memory Model to Support User-Defined Spaces

Aaron Welch; Swaroop Pophale; Pavel Shamis; Oscar R. Hernandez; Stephen W. Poole; Barbara M. Chapman

OpenSHMEM is an open standard for SHMEM libraries. With the standardisation process complete, the community is looking towards extending the API for increasing programmer flexibility and extreme scalability. According to the current OpenSHMEM specification (revision 1.1), allocation of symmetric memory is collective across all PEs executing the application. For better work sharing and memory utilisation, we are proposing the concepts of teams and spaces for OpenSHMEM that together allow allocation of memory only across user-specified teams. Through our implementation we show that by using teams we can confine memory allocation and usage to only the PEs that actually communicate via symmetric memory. We provide our preliminary results that demonstrate creating spaces for teams allows for less consumption of memory resources than the current alternative. We also examine the impact of our extensions on Scalable Synthetic Compact Applications #3 (SSCA3), which is a sensor processing and knowledge formation kernel involving file I/O, and show that up to 30% of symmetric memory allocation can be eliminated without affecting the correctness of the benchmark.

OpenSHMEM 2014 Proceedings of the First Workshop on OpenSHMEM and Related Technologies. Experiences, Implementations, and Tools - Volume 8356 | 2014

OpenSHMEM Extensions and a Vision for Its Future Direction

Stephen W. Poole; Pavel Shamis; Aaron Welch; Swaroop Pophale; Manjunath Gorentla Venkata; Oscar R. Hernandez; Gregory A. Koenig; Tony Curtis; Chung-Hsing Hsu

The Extreme Scale Systems Center (ESSC) at Oak Ridge National Laboratory (ORNL), together with the University of Houston, led the effort to standardize the SHMEM API with input from the vendors and user community. In 2012, OpenSHMEM specification 1.0 was finalized and released to the OpenSHMEM community for comments. As we move to future HPC systems, there are several shortcomings in the current specification that we need to address to ensure scalability, higher degrees of concurrency, locality, thread safety, fault-tolerance, parallel I/O capabilities, etc. In this paper we discuss an immediate set of extensions that we propose to the current API and our vision for a future API, OpenSHMEM Next-Generation (NG), that targets future Exascale systems. We also explain our rational for the proposed extensions and highlight the lessons learned from other PGAS languages and communication libraries.

Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models | 2014

Fault Tolerance for OpenSHMEM

Pengfei Hao; Pavel Shamis; Manjunath Gorentla Venkata; Swaroop Pophale; Aaron Welch; Stephen W. Poole; Barbara M. Chapman

On todays supercomputing systems, faults are becoming a norm rather than an exception. Given the complexity required for achieving expected scalability and performance on future systems, this situation is expected to become worse. The systems are expected to function in a nearly constant presence of faults. To be productive on these systems, programming models will require both hardware and software to be resilient to faults. With the growing importance of PGAS programming model and OpenSHMEM, as a part of HPC software stack, a lack of a fault tolerance model may become a liability for its users. Towards this end, in this paper, we discuss the viability of using checkpoint/restart as a fault-tolerance method for OpenSHMEM, propose a selective checkpoint/restart fault-tolerance model, and discuss challenges associated with implementing the proposed model.

international conference on supercomputing | 2013

Improving performance of openSHMEM reference library by portable PE mapping technique

Swaroop Pophale; Tony Curtis; Barbara M. Chapman

Reducing data communication cost is a critical performance consideration and the need is more acute when using libraries like the OpenSHMEM Reference library which has to sacrifice some performance optimizations for portability. Being a Partitioned Global Address Space library the OpenSHMEM reference library provides more control over data placement, yet, some communication intensive applications would benefit from the libraries prior knowledge of its communication pattern. In this poster we discuss a low cost portable methodology to provide PE re-numbering to facilitate maximum on-node communication. We validate our method using the well-documented 2D heat transfer application.

Archive | 2016

OpenSHMEM and Related Technologies. Enhancing OpenSHMEM for Hybrid Environments

Manjunath Gorentla Venkata; Neena Imam; Swaroop Pophale; Tiffany M. Mintz

Partitioned Global Address Space (PGAS) programming models combine shared and distributed memory features, and provide a foundation for high-productivity parallel programming using lightweight one-sided communications. The OpenSHMEM programming interface has recently begun gaining popularity as a lightweight library-based approach for developing PGAS applications, in part through its use of a symmetric heap to realize more efficient implementations of global pointers than in other PGAS systems. However, current approaches to hybrid inter-node and intra-node parallel programming in OpenSHMEM rely on the use of multithreaded programming models (e.g., pthreads, OpenMP) that harness intra-node parallelism but are opaque to the OpenSHMEM runtime. This OpenSHMEM+X approach can encounter performance challenges such as bottlenecks on shared resources, long pause times due to load imbalances, and poor data locality. Furthermore, OpenSHMEM+X requires the expertise of hero-level programmers, compared to the use of just OpenSHMEM. All of these are hard challenges to mitigate with incremental changes. This situation will worsen as computing nodes increase their use of accelerators and heterogeneous memories. In this paper, we introduce the AsyncSHMEM PGAS library which supports a tighter integration of shared and distributed memory parallelism than past OpenSHMEM implementations. AsyncSHMEM integrates the existing OpenSHMEM reference implementation with a thread-pool-based, intra-node, work-stealing runtime. It aims to prepare OpenSHMEM for future generations of HPC systems by enabling the use of asynchronous computation to hide data transfer latencies, supporting tight interoperability of OpenSHMEM with task parallel programming, improving load balance (both of communication and computation), and enhancing locality. In this paper we present the design of AsyncSHMEM, and demonstrate the performance of our initial AsyncSHMEM implementation by performing a scalability analysis of two benchmarks on the Titan supercomputer. These early results are promising, and demonstrate that AsyncSHMEM is more programmable than the OpenSHMEM+OpenMP model, while delivering comparable performance for a regular benchmark (ISx) and superior performance for an irregular benchmark (UTS). c

OpenSHMEM 2014 Proceedings of the First Workshop on OpenSHMEM and Related Technologies. Experiences, Implementations, and Tools - Volume 8356 | 2014

Extending the OpenSHMEM Analyzer to Perform Synchronization and Multi-valued Analysis

Swaroop Pophale; Oscar R. Hernandez; Stephen W. Poole; Barbara M. Chapman

OpenSHMEM Analyzer (OSA) is a compiler-based tool that provides static analysis forOpenSHMEMprograms. It was developed with the intention of providing feedback to the users about semantics errors due to incorrect use of the OpenSHMEM API in their programs, thus making development of OpenSHMEMapplications an easier task for beginners as well as experienced programmers. In this paper we discuss the improvements to theOSA tool to perform parallel analysis to detect collective synchronization structure of a program. Synchronization is a critical aspect of all programming models and in OpenSHMEMit is the responsibility of the programmer to introduce synchronization calls to ensure the completion of communication among processing elements (PEs) to prevent use of old/incorrect data, avoid deadlocks and ensure data race free execution keeping in mind the semantics of OpenSHMEM library specification. Our analysis yields three tangible outputs: a detailed control flow graph (CFG) making all the OpenSHMEM calls used, a system dependence graph and a barrier tree. The barrier tree represents the synchronization structure of the programpresented in a simplisticmanner that enables visualization of the programs synchronization keeping in mind the concurrent nature of SPMD applications that use OpenSHMEM library calls. This provides a graphical representation of the synchronization calls in the order in which they appear at execution time and how the different PEs in OpenSHMEMmay encounter them based upon the different execution paths available in the program. Our results include the summarization of the analysis conducted within themiddle-endof a compiler and the improvementswe have done to the existing analysis to make it aware of the parallelism in the OpenSHMEM program.

Workshop on OpenSHMEM and Related Technologies | 2016

Evaluating OpenSHMEM Explicit Remote Memory Access Operations and Merged Requests

Swen Boehm; Swaroop Pophale; Manjunath Gorentla Venkata

The OpenSHMEM Library Specification has evolved considerably since version 1.0. Recently, non-blocking implicit Remote Memory Access (RMA) operations were introduced in OpenSHMEM 1.3. These provide a way to achieve better overlap between communication and computation. However, the implicit non-blocking operations do not provide a separate handle to track and complete the individual RMA operations. They are guaranteed to be completed after either a shmem_quiet(), shmem_barrier() or a shmem_barrier_all() is called. These are global completion and synchronization operations. Though this semantic is expected to achieve a higher message rate for the applications, the drawback is that it does not allow fine-grained control over the completion of RMA operations.

Explore More