Hong Ong
Oak Ridge National Laboratory
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Hong Ong.
parallel, distributed and network-based processing | 2008
Geoffroy Vallée; Thomas Naughton; Christian Engelmann; Hong Ong; Stephen L. Scott
System-level virtualization has been a research topic since the 70s but regained popularity during the past few years because of the availability of efficient solution such as Xen and the implementation of hardware support in commodity processors (e.g. Intel-VT, AMD-V). However, a majority of system-level virtualization projects is guided by the server consolidation market. As a result, current virtualization solutions appear to not be suitable for high performance computing (HPC) which is typically based on large-scale systems. On another hand there is significant interest in exploiting virtual machines (VMs) within HPC for a number of other reasons. By visualizing the machine, one is able to run a variety of operating systems and environments as needed by the applications. Virtualization allows users to isolate workloads, improving security and reliability. It is also possible to support non-native environments and/or legacy operating environments through virtualization. In addition, it is possible to balance work loads, use migration techniques to relocate applications from failing machines, and isolate fault systems for repair. This document presents the challenges for the implementation of a system-level virtualization solution for HPC. It also presents a brief survey of the different approaches and techniques to address these challenges.
european conference on parallel processing | 2009
Anand Tikotekar; Geoffroy Vallée; Thomas Naughton; Hong Ong; Christian Engelmann; Stephen L. Scott
Virtualization technology has been gaining acceptance in the scientific community due to its overall flexibility in running HPC applications. It has been reported that a specific class of applications is better suited to a particular type of virtualization scheme or implementation. For example, Xen has been shown to perform with little overhead for compute-bound applications. Such a study, although useful, does not allow us to generalize conclusions beyond the performance analysis of that application which is explicitly executed. An explanation of why the generalization described above is difficult, may be due to the versatility in applications, which leads to different overheads in virtual environments. For example, two similar applications may spend disproportionate amount of time in their respective library code when run in virtual environments. In this paper, we aim to study such potential causes by investigating the behavior and identifying patterns of various overheads for HPC benchmark applications. Based on the investigation of the overhead profiles for different benchmarks, we aim to address questions such as: Are the overhead profiles for a particular type of benchmarks (such as compute-bound) similar or are there grounds to conclude otherwise?
cluster computing and the grid | 2006
Jin Wu; Chokchai Leangsuksun; Vishal Rampure; Hong Ong
Grid technology enables access and sharing of data and computational resources across administrative domains. Thus, it is important to provide a uniform access and management mechanism couple with finegrain usage policies for enforcing authorization. In this paper, we describe our work on enabling finegrain access control for resource usage and management. We describe the prototype as well as the policy mark-up language that we designed to express fine-grain security policies. We then present our experimental results and discuss our plans for future work.
ieee international conference on high performance computing data and analytics | 2008
Anand Tikotekar; Geoffroy Vallée; Thomas Naughton; Hong Ong; Christian Engelmann; Stephen L. Scott; Anthony M. Filippi
The topic of system-level virtualization has recently begun to receive interest for high performance computing (HPC). This is in part due to the isolation and encapsulation offered by the virtual machine. These traits enable applications to customize their environments and maintain consistent software configurations in their virtual domains. Additionally, there are mechanisms that can be used for fault tolerance like live virtual machine migration. Given these attractive benefits to virtualization, a fundamental question arises, how does this effect my scientific application? We use this as the premise for our paper and observe a real-world scientific code running on a Xen virtual machine. We studied the effects of running a radiative transfer simulation, Hydrolight, on a virtual machine. We discuss our methodology and report observations regarding the usage of virtualization with this application.
international conference on cluster computing | 2009
Hong Ong; Natthapol Saragol; Kasidit Chanchio; Chokchai Leangsuksun
Virtual machine, which typically consists of a guest operating system (OS) and its serial applications, can be checkpointed, migrated to another cluster node, and restarted later to its previous saved state. However, to date, it is nontrivial to provide checkpoint-restart mechanisms with the same level of transparency for distributed applications running on a cluster of virtual machines. To address this particular issue, we have created the Virtual Cluster CheckPointing (VCCP) system, a novel system for transparent coordinated checkpoint-restart of virtual machines and its distributed application on commodity clusters. In this paper, we detail the design and implementation of the VCCP system. Our VCCP prototype extends the open source QEMU system with kqemu module by implementing hypervisor-based Coordinated Checkpoint-Restart protocols. To verify and validate our prototype, we measured its performance using the NAS parallel benchmark. Our experimental results indicate that VCCP generates less than 1% of additional execution overhead for non-communication intensive parallel applications. Furthermore, our correctness analysis shows that VCCP does not cause message loss or reordering, which is a necessary property to ensure correctness of checkpoint-restart mechanism. Finally, we believe that VCCP is a promising checkpoint-restart alternative for legacy applications that have implemented traditional process-level checkpoint-restart.
international parallel and distributed processing symposium | 2008
Sadaf R. Alam; Pratul K. Agarwal; Scott S. Hampton; Hong Ong; Jeffrey S. Vetter
Processing nodes of the cray XT and IBM blue gene massively parallel processing (MPP) systems are composed of multiple execution units, sharing memory and network subsystems. These multicore processors offer greater computational power, but may be hindered by resource contention. In order to understand and avoid such situations, we investigate the impact of resource contention on three scalable molecular dynamics suites: AMBER (PMEMD module), LAMMPS, and NAMD. The results reveal the factors that can inhibit scaling and performance efficiency on emerging multicore processors.
ieee international conference on high performance computing, data, and analytics | 2008
Sadaf R. Alam; Pratul K. Agarwal; Scott S. Hampton; Hong Ong
Multi-core processors introduce many challenges both at the systemand application levels that need to be addressed in order to attain the best performance.In this paper, we study the impact of the multi-core technologies inthe context of two scalable, production-level molecular dynamics simulationframeworks. Experimental analysis and observations in this paper provide for abetter understanding of the interactions between the application and the underlyingsystem features such as memory bandwidth, architectural optimization,and communication library implementation. In particular, we observe that parallelefficiencies could be as low as 50% on quad-core systems while a set ofdual-core processors connected with a high speed interconnect can easily outperformthe same number of cores on a socket or in a package. This indicates thatcertain modifications to the software stack and application implementations arenecessary in order to fully exploit the performance of multi-core based systems.
cluster computing and the grid | 2006
Chokchia Leangsuksun; Tirumala Rao; Anand Tikotekar; Stephen L. Scott; Richard Libby; Jeffrey S. Vetter; Yung-Chin Fang; Hong Ong
The demand for an efficient Jhult tolerance system has led to the development of complex monitoring infrastructure, which in turn has created an overwhelming task of data and event management. The increasing level of details at the hardware and software layer clearly afects the scalability and peijbrmance of monitoring and management tools. In this paper, we propose a problem notiJication framework that directly addresses the issue of monitor scalability. We first present the design and inzpIementation of our step-by-step approach to analyzing, filtering, and clas,slfiing the plethora of node statistics. Then, we present experimental results to show that our approach only needs minimal system resource and thus has low overhead. Finally, we introduce our web-based cluster management system that provides hardware controls at both cluster and nodal levels.
european conference on parallel processing | 2001
Mark Baker; Paul A. Farrell; Hong Ong; Stephen L. Scott
As the technology for high-speedne tworks has evolvedo ver the last decade, the interconnection of commodity computers (e.g., PCs andw orkstations) at gigabit rates has become a reality. However, the improvedp erformance of high-speedne tworks has not been matcheds o far by a proportional improvement in the ability of the TCP/IP protocol stack. As a result the Virtual Interface Architecture (VIA) was developed to remedy this situation by providing a lightweight communication protocol that bypasses operating system interaction, providing low latency and high bandwidth communications for cluster computing. In this paper, we evaluate andc ompare the performance characteristics of both hardware (Giganet) and software (M-VIA) implementations of VIA. In particular, we focus on the performance of the VIA send/receive synchronization mechanism on both uniprocessor andd ual processor systems. The tests were conducted on a Linux cluster of PCs connected by a Gigabit Ethernet network. The performance statistics were collected using a local version of NetPIPE adapted for VIA.
ieee international conference on high performance computing data and analytics | 2009
Anand Tikotekar; Hong Ong; Sadaf R. Alam; Geoffroy Vallée; Thomas Naughton; Christian Engelmann; Stephen L. Scott
Obtaining high flexibility to performance-loss ratio is a key challenge of todays HPC virtual environment landscape. And while extensive research has been targeted at extracting more performance from virtual machines, the idea that whether novel virtual machine usage scenarios could lead to high flexibility Vs performance trade-off has received less attention. We, in this paper, take a step forward by studying and comparing the performance implications of running the Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS) application on two virtual machine configurations. First configuration consists of two virtual machines per node with 1 application process per virtual machine. The second configuration consists of 1 virtual machine per node with 2 processes per virtual machine. Xen has been used as an hypervisor and standard Linux as a guest virtual machine. Our results show that the difference in overall performance impact on LAMMPS between the two virtual machine configurations described above is around 3%. We also study the difference in performance impact in terms of each configurations individual metrics such as CPU, I/O, Memory, and interrupt/context switches.