H. Howie Huang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where H. Howie Huang is active.

Explore More

Publication

Featured researches published by H. Howie Huang.

ieee international conference on high performance computing data and analytics | 2011

TRACON: interference-aware scheduling for data-intensive applications in virtualized environments

Ron Chi-Lung Chiang; H. Howie Huang

Large-scale data centers leverage virtualization technology to achieve excellent resource utilization, scalability, and high availability. Ideally, the performance of an application running inside a virtual machine (VM) shall be independent of co-located applications and VMs that share the physical machine. However, adverse interference effects exist and are especially severe for data-intensive applications in such virtualized environments. In this work, we present TRACON, a novel Task and Resource Allocation CONtrol framework that mitigates the interference effects from concurrent data-intensive applications and greatly improves the overall system performance. TRACON utilizes modeling and control techniques from statistical machine learning and consists of three major components: the interference prediction model that infers application performance from resource consumption observed from different VMs, the interference-aware scheduler that is designed to utilize the model for effective resource management, and the task and resource monitor that collects application characteristics at the runtime for model adaption. We implement and validate TRACON with a variety of cloud applications. The evaluation results show that TRACON can achieve up to 25 percent improvement on application throughput on virtualized servers.

international conference on power aware computing and systems | 2010

Low-power amdahl-balanced blades for data intensive computing

Alexander S. Szalay; Gordon Bell; H. Howie Huang; Andreas Terzis; Alainna White

Enterprise and scientific data sets double every year, forcing similar growths in storage size and power consumption. As a consequence, current system architectures used to build data warehouses are about to hit a power consumption wall. In this paper we propose an alternative architecture comprising large number of so-called Amdahl blades that combine energy-efficient CPUs with solid state disks to increase sequential read I/O throughput by an order of magnitude while keeping power consumption constant. We also show that while keeping the total cost of ownership constant, Amdahl blades offer five times the throughput of a state-of-theart computing cluster for data-intensive applications. Finally, using the scaling laws originally postulated by Amdahl, we show that systems for data-intensive computing must maintain a balance between low power consumption and per-server throughput to optimize performance perWatt.

Journal of Computational Physics | 2016

Computational modeling of cardiac hemodynamics

Rajat Mittal; Jung Hee Seo; Vijay Vedula; Young Joon Choi; Hang Liu; H. Howie Huang; Saurabh Jain; Laurent Younes; Theodore P. Abraham; Richard T. George

The proliferation of four-dimensional imaging technologies, increasing computational speeds, improved simulation algorithms, and the widespread availability of powerful computing platforms is enabling simulations of cardiac hemodynamics with unprecedented speed and fidelity. Since cardiovascular disease is intimately linked to cardiovascular hemodynamics, accurate assessment of the patients hemodynamic state is critical for the diagnosis and treatment of heart disease. Unfortunately, while a variety of invasive and non-invasive approaches for measuring cardiac hemodynamics are in widespread use, they still only provide an incomplete picture of the hemodynamic state of a patient. In this context, computational modeling of cardiac hemodynamics presents as a powerful non-invasive modality that can fill this information gap, and significantly impact the diagnosis as well as the treatment of cardiac disease. This article reviews the current status of this field as well as the emerging trends and challenges in cardiovascular health, computing, modeling and simulation and that are expected to play a key role in its future development. Some recent advances in modeling and simulations of cardiac flow are described by using examples from our own work as well as the research of other groups.

ieee conference on mass storage systems and technologies | 2011

Performance modeling and analysis of flash-based storage devices

H. Howie Huang; Shan Li; Alexander S. Szalay; Andreas Terzis

Flash-based solid-state drives (SSDs) will become key components in future storage systems. An accurate performance model will not only help understand the state-of-the-art of SSDs, but also provide the research tools for exploring the design space of such storage systems. Although over the years many performance models were developed for hard drives, the architectural differences between two device families prevent these models from being effective for SSDs. The hard drive performance models cannot account for several unique characteristics of SSDs, e.g., low latency, slow update, and expensive block-level erase. In this paper, we utilize the black-box modeling approach to analyze and evaluate SSD performance, including latency, bandwidth, and throughput, as it requires minimal a priori information about the storage devices. We construct the black-box models, using both synthetic workloads and real-world traces, on three SSDs, as well as an SSD RAID. We find that, while the black-box approach may produce less desirable performance predictions for hard disks, a black-box SSD model with a comprehensive set of workload characteristics can produce accurate predictions for latency, bandwidth, and throughput with small errors.

ieee international conference on high performance computing data and analytics | 2015

Enterprise: breadth-first graph traversal on GPUs

Hang Liu; H. Howie Huang

The Breadth-First Search (BFS) algorithm serves as the foundation for many graph-processing applications and analytics workloads. While Graphics Processing Unit (GPU) offers massive parallelism, achieving high-performance BFS on GPUs entails efficient scheduling of a large number of GPU threads and effective utilization of GPU memory hierarchy. In this paper, we present Enterprise, a new GPU-based BFS system that combines three techniques to remove potential performance bottlenecks: (1) streamlined GPU threads scheduling through constructing a frontier queue without contention from concurrent threads, yet containing no duplicated frontiers and optimized for both top-down and bottom-up BFS. (2) GPU workload balancing that classifies the frontiers based on different out-degrees to utilize the full spectrum of GPU parallel granularity, which significantly increases thread-level parallelism; and (3) GPU based BFS direction optimization quantifies the effect of hub vertices on direction-switching and selectively caches a small set of critical hub vertices in the limited GPU shared memory to reduce expensive random data accesses. We have evaluated Enterprise on a large variety of graphs with different GPU devices. Enterprise achieves up to 76 billion traversed edges per second (TEPS) on a single NVIDIA Kepler K40, and up to 122 billion TEPS on two GPUs that ranks No. 45 in the Graph 500 on November 2014. Enterprise is also very energy-efficient as No. 1 in the GreenGraph 500 (small data category), delivering 446 million TEPS per watt.

ieee international conference on cloud computing technology and science | 2012

Understanding the effects of hypervisor I/O scheduling for virtual machine performance interference

Ziye Yang; Haifeng Fang; Yingjun Wu; Chungi Li; Bin Zhao; H. Howie Huang

In virtualized environments, the customers who purchase virtual machines (VMs) from a third-party cloud would expect that their VMs run in an isolated manner. However, the performance of a VM can be negatively affected by co-resident VMs. In this paper, we propose vExplorer, a distributed VM I/O performance measurement and analysis framework, where one can use a set of representative I/O operations to identify the I/O scheduling characteristics within a hypervisor; and potentially leverage this knowledge to carry out I/O based performance attacks to slow down the execution of the target VMs. We evaluate our prototype on both Xen and VMware platforms with four server benchmarks and show that vExplorer is practical and effective. We also conduct similar tests on Amazons EC2 platform and successfully slow down the performance of target VMs.

international conference on management of data | 2016

iBFS: Concurrent Breadth-First Search on GPUs

Hang Liu; H. Howie Huang; Yang Hu

Breadth-First Search (BFS) is a key graph algorithm with many important applications. In this work, we focus on a special class of graph traversal algorithm - concurrent BFS - where multiple breadth-first traversals are performed simultaneously on the same graph. We have designed and developed a new approach called iBFS that is able to run i concurrent BFSes from i distinct source vertices, very efficiently on Graphics Processing Units (GPUs). iBFS consists of three novel designs. First, iBFS develops a single GPU kernel for joint traversal of concurrent BFS to take advantage of shared frontiers across different instances. Second, outdegree-based GroupBy rules enables iBFS to selectively run a group of BFS instances which further maximizes the frontier sharing within such a group. Third, iBFS brings additional performance benefit by utilizing highly optimized bitwise operations on GPUs, which allows a single GPU thread to inspect a vertex for concurrent BFS instances. The evaluation on a wide spectrum of graph benchmarks shows that iBFS on one GPU runs up to 30x faster than executing BFS instances sequentially, and on 112 GPUs achieves near linear speedup with the maximum performance of 57,267 billion traversed edges per second (TEPS).

ieee international conference on high performance computing data and analytics | 2016

G-store: high-performance graph store for trillion-edge processing

Pradeep Kumar; H. Howie Huang

High-performance graph processing brings great benefits to a wide range of scientific applications, e.g., biology networks, recommendation systems, and social networks, where such graphs have grown to terabytes of data with billions of vertices and trillions of edges. Subsequently, storage performance plays a critical role in designing a high-performance computer system for graph analytics. In this paper, we present G-Store, a new graph store that incorporates three techniques to accelerate the I/O and computation of graph algorithms. First, G-Store develops a space-efficient tile format for graph data, which takes advantage of the symmetry present in graphs as well as a new smallest number of bits representation. Second, G-Store utilizes tile-based physical grouping on disks so that multi-core CPUs can achieve high cache and memory performance and fully utilize the throughput from an array of solid-state disks. Third, G-Store employs a novel slide-cache-rewind strategy to pipeline graph I/O and computing. With a modest amount of memory, G-Store utilizes a proactive caching strategy in the system so that all fetched graph data are fully utilized before evicted from memory. We evaluate G-Store on a number of graphs against two state-of-the-art graph engines and show that G-Store achieves 2 to 8× saving in storage and outperforms both by 2 to 32×. G-Store is able to run different algorithms on trillion-edge graphs within tens of minutes, setting a new milestone in semi-external graph processing system.

international conference on communications | 2012

Providing reliability as an elastic service in cloud computing

Nakharin Limrungsi; Juzi Zhao; Yu Xiang; Tian Lan; H. Howie Huang; Suresh Subramaniam

Modern day data centers coordinate hundreds of thousands of heterogeneous tasks and aim at delivering highly reliable cloud computing services. Although offering equal reliability to all users benefits everyone at the same time, users may find such an approach either too inadequate or too expensive to fit their individual requirements, which may vary dramatically. In this paper, we propose a novel method for providing reliability as an elastic and on-demand service. Our scheme makes use of peer-to-peer checkpointing and allows user reliability levels to be jointly optimized based on an assessment of their individual requirements and total available resources in the data center. We show that the joint optimization can be efficiently solved by a distributed algorithm using dual decomposition. The solution improves resource utilization and presents an additional source of revenue to data center operators. Our validation results suggest a significant improvement of reliability over existing schemes.

symposium on cloud computing | 2013

Mortar: filling the gaps in data center memory

Jinho Hwang; Ahsen J. Uppal; Timothy Wood; H. Howie Huang

Data center servers are typically overprovisioned, leaving spare memory and CPU capacity idle to handle unpredictable workload bursts by the virtual machines running on them [1, 2, 3]. While this allows for fast hotspot mitigation, it is also wasteful. Unfortunately, making use of spare capacity without impacting active applications is particularly difficult for memory since it typically must be allocated in coarse chunks over long timescales [4, 5, 6, 7]. In this work we propose repurposing the poorly utilized memory in a data center to store a volatile data store that is managed by the hypervisor. We present two uses for our Mortar framework: as a cache for prefetching disk blocks [8, 9, 10], and as an application-level distributed cache that follows the memcached protocol [11, 12]. Both prototypes use the framework to ask the hypervisor to store useful, but recoverable data within its free memory pool. This allows the hypervisor to control eviction policies and prioritize access to the cache.

Explore More