Is this you? Create Your Porfile

Huaiming Song

Illinois Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Huaiming Song is active.

Explore More

Publication

Featured researches published by Huaiming Song.

ieee international conference on high performance computing data and analytics | 2011

Server-side I/O coordination for parallel file systems

Huaiming Song; Yanlong Yin; Xian-He Sun; Rajeev Thakur; Samuel Lang

Parallel file systems have become a common component of modern high-end computers to mask the ever-increasing gap between disk data access speed and CPU computing power. However, while working well for certain applications, current parallel file systems lack the ability to effectively handle concurrent I/O requests with data synchronization needs, whereas concurrent I/O is the norm in data-intensive applications. Recognizing that an I/O request will not complete until all involved file servers in the parallel file system have completed their parts, in this paper we propose a server-side I/O coordination scheme for parallel file systems. The basic idea is to coordinate file servers to serve one application at a time in order to reduce the completion time, and in the meantime maintain the server utilization and fairness. A window-wide coordination concept is introduced to serve our purpose. We present the proposed I/O coordination algorithm and its corresponding analysis of average completion time in this study. We also implement a prototype of the proposed scheme under the PVFS2 file system and MPI-IO environment. Experimental results demonstrate that the proposed scheme can reduce average completion time by 8% to 46%, and provide higher I/O bandwidth than that of default data access strategies adopted by PVFS2 for heavy I/O workloads. Experimental results also show that the server-side I/O coordination scheme has good scalability.

high performance distributed computing | 2011

A cost-intelligent application-specific data layout scheme for parallel file systems

Huaiming Song; Yanlong Yin; Yong Chen; Xian-He Sun

I/O data access is a recognized performance bottleneck of high-end computing. Several commercial and research parallel file systems have been developed in recent years to ease the performance bottleneck. These advanced file systems perform well on some applications but may not perform well on others. They have not reached their full potential in mitigating the I/O-wall problem. Data access is application dependent. Based on the application-specific optimization principle, in this study we propose a cost-intelligent data access strategy to improve the performance of parallel file systems. We first present a novel model to estimate data access cost of different data layout policies. Next, we extend the cost model to calculate the overall I/O cost of any given application and choose an appropriate layout policy for the application. A complex application may consist of different data access patterns. Averaging the data access patterns may not be the best solution for those complex applications that do not have a dominant pattern. We then further propose a hybrid data replication strategy for those applications, so that a file can have replications with different layout policies for the best performance. Theoretical analysis and experimental testing have been conducted to verify the newly proposed cost-intelligent layout approach. Analytical and experimental results show that the proposed cost model is effective and the application-specific data layout approach achieved up to 74% performance improvement for data-intensive applications.

ieee/acm international symposium cluster, cloud and grid computing | 2011

A Segment-Level Adaptive Data Layout Scheme for Improved Load Balance in Parallel File Systems

Huaiming Song; Yanlong Yin; Xian-He Sun; Rajeev Thakur; Samuel Lang

Parallel file systems are designed to mask the ever-increasing gap between CPU and disk speeds via parallel I/O processing. While they have become an indispensable component of modern high-end computing systems, their inadequate performance is a critical issue facing the HPC community today. Conventionally, a parallel file system stripes a file across multiple file servers with a fixed stripe size. The stripe size is a vital performance parameter, but the optimal value for it is often application dependent. How to determine the optimal stripe size is a difficult research problem. Based on the observation that many applications have different data-access clusters in one file, with each cluster having a distinguished data access pattern, we propose in this paper a segmented data layout scheme for parallel file systems. The basic idea behind the segmented approach is to divide a file logically into segments such that an optimal stripe size can be identified for each segment. A five-step method is introduced to conduct the segmentation, to identify the appropriate stripe size for each segment, and to carry out the segmented data layout scheme automatically. Experimental results show that the proposed layout scheme is feasible and effective, and it improves performance up to 163% for writing and 132% for reading on the widely used IOR and IOzone benchmarks.

cluster computing and the grid | 2012

Boosting Application-Specific Parallel I/O Optimization Using IOSIG

Yanlong Yin; Surendra Byna; Huaiming Song; Xian-He Sun; Rajeev Thakur

Many scientific applications spend a significant portion of their execution time in accessing data from files. Various optimization techniques exist to improve data access performance, such as data prefetching and data layout optimization. However, optimization process is usually a difficult task due to the complexity involved in understanding I/O behavior. Tools that can help simplify the optimization process have a significant importance. In this paper, we introduce a tool, called IOSIG, for providing a better understanding of parallel I/O accesses and information to be used for optimization techniques. The tool enables tracing parallel I/O calls of an application and analyzing the collected information to provide a clear understanding of I/O behavior of the application. We show that performance overheads of the tool in trace collection and analysis are negligible. The analysis step creates I/O signatures that various optimizations can use for improving I/O performance. I/O signatures are compact, easy-to-understand, and parameterized representations containing data access pattern information such as size, strides between consecutive accesses, repetition, timing, etc. The signatures include local I/O behavior for each process and global behavior for an overall application. We illustrate the usage of the IOSIG tool in data prefetching and data layout optimizations.

international conference on cluster computing | 2010

Improving Parallel I/O Performance with Data Layout Awareness

Yong Chen; Xian-He Sun; Rajeev Thakur; Huaiming Song; Hui Jin

Parallel applications can benefit greatly from massive computational capability, but their performance suffers from large latency of I/O accesses. The poor I/O performance has been attributed as a critical cause of the low sustained performance of parallel computing systems. In this study, we propose a data layout-aware optimization strategy to promote a better integration of the parallel I/O middleware and parallel file systems, two major components of the current parallel I/O systems, and to improve the data access performance. We explore the layout-aware optimization in both independent I/O and collective I/O, two primary forms of I/O in parallel applications. We illustrate that the layout-aware I/O optimization could improve the performance of current parallel I/O strategy effectively. The experimental results verify that the proposed strategy could improve parallel I/O performance by nearly 40% on average. The proposed layout-aware parallel I/O has a promising potential in improving the I/O performance of parallel systems.

international parallel and distributed processing symposium | 2012

A Server-Level Adaptive Data Layout Strategy for Parallel File Systems

Huaiming Song; Hui Jin; Jun He; Xian-He Sun; Rajeev Thakur

Parallel file systems are widely used for providing a high degree of I/O parallelism to mask the gap between I/O and memory speed. However, peak I/O performance is rarely attained due to complex data access patterns of applications. Based on the observation that the I/O performance of small requests is often limited by the request service rate, and the performance of large requests is limited by I/O bandwidth, we take into consideration both factors and propose a server-level adaptive data layout strategy. The proposed strategy adopts different stripe sizes for different file servers according to the data access characteristics on each individual server. We let the file servers that can fully utilize bandwidth hold more data, and the file servers that are limited with request service rate hold less data. As a result, heavy-load servers can offload some data accesses to light-load servers for potential improvement of I/O performance. We present a method to measure access cost for each data block and then utilize an equal-depth histogram approach to distributed data blocks across multiple servers adaptively, so as to balance data accesses on all file servers. Analytical and experimental results demonstrate that the proposed server-level adaptive layout strategy can improve I/O performance by as much as 80.3% and is more appropriate for applications with complex data access patterns.

petascale data storage workshop | 2011

Pattern-aware file reorganization in MPI-IO

Jun He; Huaiming Song; Xian-He Sun; Yanlong Yin; Rajeev Thakur

Scientific computing is becoming more data-intensive; however I/O throughput is not growing at the same rate. MPI-IO and parallel file systems are expected to help bridge the gap by increasing data access parallelism. Compared to traditional I/O systems, some factors are more important in parallel I/O system in order to achieve better performance, such as the number of requests and contiguousness of accesses. The variation of these factors can lead to significant differences in performance. Programmers usually arrange data in a logical fashion for ease of programming and data manipulation; however, this may not be ideal for parallel I/O systems. Without taking into account the organization of file and behavior of the I/O system, the performance may be badly degraded. In this paper, a novel method of reorganizing files in I/O middleware level is proposed, which takes into account the access patterns. By placing data in a way favoring the parallel I/O system, gains of up to two orders of magnitudes in reading and up to one order of magnitude in writing were observed with spinning disks and solid-state disks.

high performance distributed computing | 2010

A layout-aware optimization strategy for collective I/O

Yong Chen; Huaiming Song; Rajeev Thakur; Xian-He Sun

In this study, we propose an optimization strategy to promote a better integration of the parallel I/O middleware and parallel file systems. We illustrate that a layout-aware optimization strategy can improve the performance of current collective I/O in parallel I/O system. We present the motivation, prototype design and initial verification of the proposed layout-aware optimization strategy. The analytical and initial experimental testing results demonstrate that the proposed strategy has a potential in improving the parallel I/O system performance.

international symposium on parallel and distributed processing and applications | 2011

A Hybrid Shared-Nothing/Shared-Data Storage Scheme for Large-Scale Data Processing

Huaiming Song; Xian-He Sun; Yong Chen

Shared-nothing and shared-disk are the two most common storage architectures of parallel databases in the past two decades. Both two types of systems have their own merits for different applications. However, there are no much efforts in investigating the integration of these two architectures and exploiting their merits together. In this paper, we propose a novel hybrid storage architecture for large-scale data processing, to leverage the benefits of both shared-nothing and shared-disk architectures. In the proposed hybrid system, we adopt a shared-nothing architecture as the hardware layer and leverage a parallel file system as the storage layer to combine the scattered disks on all database nodes. We present an overall design of the new scheme, including data and storage organization, data access modes, and query processing methods. The proposed hybrid scheme can achieve both high I/O performance as a shared-nothing system, and high-speed data sharing across all server nodes as a share-disk system. Preliminary experimental results demonstrate that the hybrid scheme is promising and more appropriate for large-scale and data-intensive applications than each of the two individual types of systems.

ieee/acm international symposium cluster, cloud and grid computing | 2011

A Hybrid Shared-Nothing/Shared-Data Storage Architecture for Large Scale Databases

Huaiming Song; Xian-He Sun; Yong Chen

Shared-nothing and shared-disk are two widely-used storage architectures in current parallel database systems, and each of them has its own merits for different query patterns. However, there is no much effort in investigating the integration of these two architectures and exploiting their merits together. In this study, we propose a novel hybrid shared-nothing/shared-data storage scheme for large-scale databases, to leverage the benefits of both shared-nothing and shared-disk architectures. We adopt a shared-nothing architecture as the hardware layer and leverage a parallel file system as the storage layer. The proposed hybrid storage scheme can provide a high degree of parallelism in both I/O and computing, like that in a shared-nothing system. In the meantime, it can achieve convenient and high-speed data sharing across multiple database nodes, like that in a shared-disk system. The hybrid scheme is more appropriate for large-scale and data-intensive applications than each of the two individual types of systems.

Explore More