Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Hoang Bui is active.

Publication


Featured researches published by Hoang Bui.


ieee international conference on high performance computing data and analytics | 2013

Using cross-layer adaptations for dynamic data management in large scale coupled scientific workflows

Tong Jin; Fan Zhang; Qian Sun; Hoang Bui; Manish Parashar; Hongfeng Yu; Scott Klasky; Norbert Podhorszki; Hasan Abbasi

As system scales and application complexity grow, managing and processing simulation data has become a significant challenge. While recent approaches based on data staging and in-situ/in-transit data processing are promising, dynamic data volumes and distributions,such as those occurring in AMR-based simulations, make the efficient use of these techniques challenging. In this paper we propose cross-layer adaptations that address these challenges and respond at runtime to dynamic data management requirements. Specifically we explore (1) adaptations of the spatial resolution at which the data is processed, (2) dynamic placement and scheduling of data processing kernels, and (3) dynamic allocation of in-transit resources. We also exploit co-ordinated approaches that dynamically combine these adaptations at the different layers. We evaluate the performance of our adaptive cross-layer management approach on the Intrepid IBM-BlueGene/P and Titan Cray-XK7 systems using Chombo-based AMR applications, and demonstrate its effectiveness in improving overall time-to-solution and increasing resource efficiency.


ieee international conference on high performance computing data and analytics | 2012

In-situ Feature-Based Objects Tracking for Large-Scale Scientific Simulations

Fan Zhang; Solomon Lasluisa; Tong Jin; Ivan Rodero; Hoang Bui; Manish Parashar

Emerging scientific simulations on leadership class systems are generating huge amounts of data. However, the increasing gap between computation and disk I/O speeds makes traditional data analytics pipelines based on post-processing cost prohibitive and often infeasible. In this paper, we investigate an alternate approach that aims to bring the analytics closer to the data using data staging and the in-situ execution of data analysis operations. Specifically, we present the design, implementation and evaluation of a framework that can support in-situ feature-based object tracking on distributed scientific datasets. Central to this framework is the scalable decentralized and online clustering (DOC) and cluster tracking algorithm, which executes in-situ (on different cores) and in parallel with the simulation processes, and retrieves data from the simulations directly via on-chip shared memory. The results from our experimental evaluation demonstrate that the in-situ approach significantly reduces the cost of data movement, that the presented framework can support scalable feature-based object tracking, and that it can be effectively used for in-situ analytics for large scale simulations.


international parallel and distributed processing symposium | 2015

Exploring Data Staging Across Deep Memory Hierarchies for Coupled Data Intensive Simulation Workflows

Tong Jin; Fan Zhang; Qian Sun; Hoang Bui; Melissa Romanus; Norbert Podhorszki; Scott Klasky; Hemanth Kolla; Jacqueline H. Chen; Robert Hager; Choong-Seock Chang; Manish Parashar

As applications target extreme scales, data staging and in-situ/in-transit data processing have been proposed to address the data challenges and improve scientific discovery. However, further research is necessary in order to understand how growing data sizes from data intensive simulations coupled with the limited DRAM capacity in High End Computing systems will impact the effectiveness of this approach. In this paper, we explore how we can use deep memory levels for data staging, and develop a multi-tiered data staging method that spans bothDRAM and solid state disks (SSD). This approach allows us to support both code coupling and data management for data intensive simulation workflows. We also show how an adaptive application-aware data placement mechanism can dynamically manage and optimize data placement across the DRAM ands storage levels in this multi-tiered data staging method. We present an experimental evaluation of our approach using wolf resources: an Infiniband cluster (Sith) and a Cray XK7system (Titan), and using combustion (S3D) and fusion (XGC1) simulations.


ieee international conference on high performance computing data and analytics | 2015

Adaptive data placement for staging-based coupled scientific workflows

Qian Sun; Tong Jin; Melissa Romanus; Hoang Bui; Fan Zhang; Hongfeng Yu; Hemanth Kolla; Scott Klasky; Jacqueline H. Chen; Manish Parashar

Data staging and in-situ/in-transit data processing are emerging as attractive approaches for supporting extreme scale scientific workflows. These approaches improve end-to-end performance by enabling runtime data sharing between coupled simulations and data analytics components of the workflow. However, the complex and dynamic data exchange patterns exhibited by the workflows coupled with the varied data access behaviors make efficient data placement within the staging area challenging. In this paper, we present an adaptive data placement approach to address these challenges. Our approach adapts data placement based on application-specific dynamic data access patterns, and applies access pattern-driven and location-aware mechanisms to reduce data access costs and to support efficient data sharing between the multiple workflow components. We experimentally demonstrate the effectiveness of our approach on Titan Cray XK7 using a real combustion-analyses workflow. The evaluation results demonstrate that our approach can effectively improve data access performance and overall efficiency of coupled scientific workflows.


Concurrency and Computation: Practice and Experience | 2015

ActiveSpaces: Exploring dynamic code deployment for extreme scale data processing

Ciprian Docan; Fan Zhang; Tong Jin; Hoang Bui; Qian Sun; Julian Cummings; Norbert Podhorszki; Scott Klasky; Manish Parashar

Managing the large volumes of data produced by emerging scientific and engineering simulations running on leadership‐class resources has become a critical challenge. The data have to be extracted off the computing nodes and transported to consumer nodes so that it can be processed, analyzed, visualized, archived, and so on. Several recent research efforts have addressed data‐related challenges at different levels. One attractive approach is to offload expensive input/output operations to a smaller set of dedicated computing nodes known as a staging area. However, even using this approach, the data still have to be moved from the staging area to consumer nodes for processing, which continues to be a bottleneck. In this paper, we investigate an alternate approach, namely moving the data‐processing code to the staging area instead of moving the data to the data‐processing code. Specifically, we describe the ActiveSpaces framework, which provides (1) programming support for defining the data‐processing routines to be downloaded to the staging area and (2) runtime mechanisms for transporting codes associated with these routines to the staging area, executing the routines on the nodes that are part of the staging area, and returning the results. We also present an experimental performance evaluation of ActiveSpaces using applications running on the Cray XT5 at Oak Ridge National Laboratory. Finally, we use a coupled fusion application workflow to explore the trade‐offs between transporting data and transporting the code required for data processing during coupling, and we characterize sweet spots for each option. Copyright


international conference on cluster computing | 2014

POSTER: Leveraging deep memory hierarchies for data staging in coupled data-intensive simulation workflows

Tong Jin; Fan Zhang; Qian Sun; Hoang Bui; Norbert Podhorszki; Scott Klasky; Hemanth Kolla; Jacqueline H. Chen; R. Hager; Choong-Seock Chang; Manish Parashar

Next generation in-situ/in-transit data processing has been proposed for addressing data challenges at extreme scales. However, further research is necessary in order to understand how growing data sizes from data intensive simulations coupled with limited DRAM capacity in High End Computing clusters will impact the effectiveness of this approach. In this work, we propose using deep memory levels for data staging, utilizing a multi-tiered data staging method with both DRAM and solid state disk (SSD). This approach allows us to support both code coupling and data management for data intensive simulations in cluster environment. We also show how an application-aware data placement mechanism can dynamically manage and optimize data placement across DRAM and SSD storage levels in staging method. We present experimental results on Sith - an Infiniband cluster at Oak Ridge, and evaluate its performance using combustion (S3D) and fusion (XGC) simulations.


Cluster Computing | 2015

In-situ feature-based objects tracking for data-intensive scientific and enterprise analytics workflows

Solomon Lasluisa; Fan Zhang; Tong Jin; Ivan Rodero; Hoang Bui; Manish Parashar

Emerging scientific simulations on leadership class systems are generating huge amounts of data and processing this data in an efficient and timely manner is critical for generating insights from the simulations. However, the increasing gap between computation and disk I/O speeds makes traditional data analytics pipelines based on post-processing cost prohibitive and often infeasible. In this paper, we investigate an alternate approach that aims to bring the analytics closer to the data using in-situ execution of data analysis operations. Specifically, we present the design, implementation and evaluation of a framework that can support in-situ feature-based objects tracking on distributed scientific datasets. Central to this framework is a scalable decentralized and online clustering, a cluster tracking algorithm, which executes in-situ (on different cores) in parallel with the simulation processes, and retrieves data from the simulations directly via on-chip shared memory. The results from our experimental evaluation demonstrate that the in-situ approach significantly reduces the cost of data movement, that the presented framework can support scalable feature-based objects tracking, and that it can be effectively used for in-situ analytics in large scale simulations.


international workshop on data intensive distributed computing | 2016

Persistent Data Staging Services for Data Intensive In-situ Scientific Workflows

Melissa Romanus; Fan Zhang; Tong Jin; Qian Sun; Hoang Bui; Manish Parashar; Jong Youl Choi; Saloman Janhunen; R. Hager; Scott Klasky; Choong-Seock Chang; Ivan Rodero

Scientific simulation workflows executing on very large scale computing systems are essential modalities for scientific investigation. The increasing scales and resolution of these simulations provide new opportunities for accurately modeling complex natural and engineered phenomena. However, the increasing complexity necessitates managing, transporting, and processing unprecedented amounts of data, and as a result, researchers are increasingly exploring data-staging and in-situ workflows to reduce data movement and data-related overheads. However, as these workflows become more dynamic in their structures and behaviors, data staging and in-situ solutions must evolve to support new requirements. In this paper, we explore how the service-oriented concept can be applied to extreme-scale in-situ workflows. Specifically, we explore persistent data staging as a service and present the design and implementation of DataSpaces as a Service, a service-oriented data staging framework. We use a dynamically coupled fusion simulation workflow to illustrate the capabilities of this framework and evaluate its performance and scalability.


Concurrency and Computation: Practice and Experience | 2014

XpressSpace: a programming framework for coupling partitioned global address space simulation codes

Fan Zhang; Ciprian Docan; Hoang Bui; Manish Parashar; Scott Klasky

Complex coupled multiphysics simulations are playing increasingly important roles in scientific and engineering applications such as fusion, combustion, and climate modeling. At the same time, extreme scales, increased levels of concurrency, and the advent of multicores are making programming of high‐end parallel computing systems on which these simulations run challenging. Although partitioned global address space (PGAS) languages attempt to address the problem by providing a shared memory abstraction for parallel processes within a single program, the PGAS model does not easily support data coupling across multiple heterogeneous programs, which is necessary for coupled multiphysics simulations. This paper explores how multiphysics‐coupled simulations can be supported by the PGAS programming model. Specifically, in this paper, we present the design and implementation of the XpressSpace programming system, which extends existing PGAS data sharing and data access models with a semantically specialized shared data space abstraction to enable data coupling across multiple independent PGAS executables. XpressSpace supports a global‐view style programming interface that is consistent with the PGAS memory model, and provides an efficient runtime system that can dynamically capture the data decomposition of global‐view data‐structures such as arrays, and enable fast exchange of these distributed data‐structures between coupled applications. In this paper, we also evaluate the performance and scalability of a prototype implementation of XpressSpace by using different coupling patterns extracted from real world multiphysics simulation scenarios, on the Jaguar Cray XT5 system at Oak Ridge National Laboratory. Copyright


Concurrency and Computation: Practice and Experience | 2017

In-memory staging and data-centric task placement for coupled scientific simulation workflows: In-memory staging and data-centric task placement for coupled scientific simulation workflows

Fan Zhang; Tong Jin; Qian Sun; Melissa Romanus; Hoang Bui; Scott Klasky; Manish Parashar

Coupled scientific simulation workflows are composed of heterogeneous component applications that simulate different aspects of the physical phenomena being modeled and that interact and exchange significant volumes of data at runtime. As the data volumes and generation rates keep growing, the traditional disk I/O–based data movement approach becomes cost prohibitive, and workflow requires more scalable and efficient approach to support the data movement. Moreover, the cost of moving large volume of data over system interconnection network becomes dominating and significantly impacts the workflow execution time. Minimize the amount of network data movement and localize data transfers are critical for reducing such cost. To achieve this, workflow task placement should exploit data locality to the extent possible and move computation closer to data. In this paper, we investigate applying in‐memory data staging and data‐centric task placement to reduce the data movement cost in large‐scale coupled simulation workflows. Specifically, we present a distributed data sharing and task execution framework that (1) co‐locates in‐memory data staging on application compute nodes to store data that needs to be shared or exchanged and (2) uses data‐centric task placement to map computations onto processor cores that a large portion of the data exchanges can be performed using the intra‐node shared memory. We also present the implementation of the framework and its experimental evaluation on Titan Cray XK7 petascale supercomputer.

Collaboration


Dive into the Hoang Bui's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Scott Klasky

Oak Ridge National Laboratory

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Norbert Podhorszki

Oak Ridge National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Choong-Seock Chang

Princeton Plasma Physics Laboratory

View shared research outputs
Top Co-Authors

Avatar

Hemanth Kolla

Sandia National Laboratories

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge