David A. Boyuka
North Carolina State University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by David A. Boyuka.
high performance distributed computing | 2012
Eric R. Schendel; Saurabh V. Pendse; John Jenkins; David A. Boyuka; Zhenhuan Gong; Sriram Lakshminarasimhan; Qing Liu; Hemanth Kolla; J.H. Chen; Scott Klasky; Robert B. Ross; Nagiza F. Samatova
Current peta-scale data analytics frameworks suffer from a significant performance bottleneck due to an imbalance between their enormous computational power and limited I/O bandwidth. Using data compression schemes to reduce the amount of I/O activity is a promising approach to addressing this problem. In this paper, we propose a hybrid framework for interleaving I/O with data compression to achieve improved I/O throughput side-by-side with reduced dataset size. We evaluate several interleaving strategies, present theoretical models, and evaluate the efficiency and scalability of our approach through comparative analysis. With our theoretical model, considering 19 real-world scientific datasets both from the public domain and peta-scale simulations, we estimate that the hybrid method can result in a 12 to 46 increase in throughput on hard-to-compress scientific datasets. At the reported peak bandwidth of 60 GB/s of uncompressed data for a current, leadership-class parallel I/O system, this translates into an effective gain of 7 to 28 GB/s in aggregate throughput.
ieee international conference on high performance computing data and analytics | 2012
John Jenkins; Eric R. Schendel; Sriram Lakshminarasimhan; David A. Boyuka; Terry Rogers; Stephane Ethier; Robert B. Ross; Scott Klasky; Nagiza F. Samatova
I/O bottlenecks in HPC applications are becoming a more pressing problem as compute capabilities continue to outpace I/O capabilities. While double-precision simulation data often must be stored losslessly, the loss of some of the fractional component may introduce acceptably small errors to many types of scientific analyses. Given this observation, we develop a precision level of detail (APLOD) library, which partitions double-precision datasets along user-defined byte boundaries. APLOD parameterizes the analysis accuracy-I/O performance tradeoff, bounds maximum relative error, maintains I/O access patterns compared to full precision, and operates with low overhead. Using ADIOS as an I/O use-case, we show proportional reduction in disk access time to the degree of precision. Finally, we show the effects of partial precision analysis on accuracy for operations such as k-means and Fourier analysis, finding a strong applicability for the use of varying degrees of precision to reduce the cost of analyzing extreme-scale data.
ieee/acm international symposium cluster, cloud and grid computing | 2013
Zhenhuan Gong; David A. Boyuka; Xiaocheng Zou; Qing Liu; Norbert Podhorszki; Scott Klasky; Xiaosong Ma; Nagiza F. Samatova
The size and scope of cutting-edge scientific simulations are growing much faster than the I/O and storage capabilities of their run-time environments. The growing gap is exacerbated by exploratory, data-intensive analytics, such as querying simulation data with multivariate, spatio-temporal constraints, which induces heterogeneous access patterns that stress the performance of the underlying storage system. Previous work addresses data layout and indexing techniques to improve query performance for a single access pattern, which is not sufficient for complex analytics jobs. We present PARLO a parallel run-time layout optimization framework, to achieve multi-level data layout optimization for scientific applications at run-time before data is written to storage. The layout schemes optimize for heterogeneous access patterns with user-specified priorities. PARLO is integrated with ADIOS, a high-performance parallel I/O middleware for large-scale HPC applications, to achieve user-transparent, light-weight layout optimization for scientific datasets. It offers simple XML-based configuration for users to achieve flexible layout optimization without the need to modify or recompile application codes. Experiments show that PARLO improves performance by 2 to 26 times for queries with heterogeneous access patterns compared to state-of-the-art scientific database management systems. Compared to traditional post-processing approaches, its underlying run-time layout optimization achieves a 56% savings in processing time and a reduction in storage overhead of up to 50%. PARLO also exhibits a low run-time resource requirement, while also limiting the performance impact on running applications to a reasonable level.
european conference on parallel processing | 2014
Houjun Tang; Xiaocheng Zou; John Jenkins; David A. Boyuka; Stephen Ranshous; Dries Kimpe; Scott Klasky; Nagiza F. Samatova
Among the major challenges of transitioning to exascale in HPC is the ubiquitous I/O bottleneck. For analysis and visualization applications in particular, this bottleneck is exacerbated by the write-onceread- many property of most scientific datasets combined with typically complex access patterns. One promising way to alleviate this problem is to recognize the application’s access patterns and utilize them to prefetch data, thereby overlapping computation and I/O. However, current research methods for analyzing access patterns are either offline-only and/or lack the support for complex access patterns, such as high-dimensional strided or composition-based unstructured access patterns. Therefore, we propose an online analyzer capable of detecting both simple and complex access patterns with low computational and memory overhead and high accuracy. By combining our pattern detection with prefetching,we consistently observe run-time reductions, up to 26%, across 18 configurations of PIOBench and 4 configurations of a micro-benchmark with both structured and unstructured access patterns.
ieee/acm international symposium cluster, cloud and grid computing | 2015
Xiaocheng Zou; Kesheng Wu; David A. Boyuka; Daniel F. Martin; Surendra Byna; Houjun Tang; Kushal Bansal; Terry J. Ligocki; Hans Johansen; Nagiza F. Samatova
Adaptive Mesh Refinement (AMR) represents a significant advance for scientific simulation codes, greatly reducing memory and compute requirements by dynamically varying simulation resolution over space and time. As simulation codes transition to AMR, existing analysis algorithms must also make this transition. One such algorithm, connected component detection, is of vital importance in many simulation and analysis contexts, with some simulation codes even relying on parallel, in situ connected component detection for correctness. Yet, current detection algorithms designed for uniform meshes are not applicable to hierarchical, non-uniform AMR, and to the best of our knowledge, AMR connected component detection has not been explored in the literature. Therefore, in this paper, we formally define the general problem of connected component detection for AMR, and present a general solution. Beyond solving the general detection problem, achieving viable in situ detection performance is even more challenging. The core issue is the conflict between the communication-intensive nature of connected component detection (in general, and especially for AMR data) and the requirement that in situ processes incur minimal performance impact on the co-located simulation. We address this challenge by presenting the first connected component detection methodology for structured AMR that is applicable in a parallel, in situ context. Our key strategy is the incorporation of an multi-phase AMR-aware communication pattern that synchronizes connectivity information across the AMR hierarchy. In addition, we distil our methodology to a generic framework within the Combo AMR infrastructure, making connected component detection services available for many existing applications. We demonstrate our methods efficacy by showing its ability to detect ice calving events in real time within the real-world BISICLES ice sheet modelling code. Results show up to a 6.8x speedup of our algorithm over the existing specialized BISICLES algorithm. We also show scalability results for our method up to 4,096 cores using a parallel Combo-based benchmark.
european conference on parallel processing | 2014
Xiaocheng Zou; Sriram Lakshminarasimhan; David A. Boyuka; Stephen Ranshous; Houjun Tang; Scott Klasky; Nagiza F. Samatova
Set intersection is a fundamental operation for evaluating conjunctive queries in the context of scientific data analysis. The state-of-the-art approach in performing set intersection, compressed bitmap indexing, achieves high computational efficiency because of cheap bitwise operations; however, overall efficiency is often nullified by the HPC I/O bottleneck, because compressed bitmap indexes typically exhibit a heavy storage footprint. Conversely, the recently-presented PForDelta-compressed index has been demonstrated to be storage-lightweight, but has limited performance for set intersection. Thus, a more effective set intersection approach should be efficient in both computation and I/O.
high performance distributed computing | 2013
Sriram Lakshminarasimhan; David A. Boyuka; Saurabh V. Pendse; Xiaocheng Zou; John Jenkins; Venkatram Vishwanath; Michael E. Papka; Nagiza F. Samatova
Cluster Computing | 2014
Sriram Lakshminarasimhan; Xiaocheng Zou; David A. Boyuka; Saurabh V. Pendse; John Jenkins; Venkatram Vishwanath; Michael E. Papka; Scott Klasky; Nagiza F. Samatova
Trans. Large-Scale Data- and Knowledge-Centered Systems | 2013
John Jenkins; Sriram Lakshminarasimhan; David A. Boyuka; Eric R. Schendel; Neil Shah; Stephane Ethier; Choong-Seock Chang; J.H. Chen; Hemanth Kolla; Scott Klasky; Robert B. Ross; Nagiza F. Samatova
statistical and scientific database management | 2015
David A. Boyuka; Houjun Tang; Kushal Bansal; Xiaocheng Zou; Scott Klasky; Nagiza F. Samatova