Is this you? Create Your Porfile

Yanlong Yin

Illinois Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yanlong Yin is active.

Explore More

Publication

Featured researches published by Yanlong Yin.

ieee international conference on high performance computing data and analytics | 2011

Server-side I/O coordination for parallel file systems

Huaiming Song; Yanlong Yin; Xian-He Sun; Rajeev Thakur; Samuel Lang

Parallel file systems have become a common component of modern high-end computers to mask the ever-increasing gap between disk data access speed and CPU computing power. However, while working well for certain applications, current parallel file systems lack the ability to effectively handle concurrent I/O requests with data synchronization needs, whereas concurrent I/O is the norm in data-intensive applications. Recognizing that an I/O request will not complete until all involved file servers in the parallel file system have completed their parts, in this paper we propose a server-side I/O coordination scheme for parallel file systems. The basic idea is to coordinate file servers to serve one application at a time in order to reduce the completion time, and in the meantime maintain the server utilization and fairness. A window-wide coordination concept is introduced to serve our purpose. We present the proposed I/O coordination algorithm and its corresponding analysis of average completion time in this study. We also implement a prototype of the proposed scheme under the PVFS2 file system and MPI-IO environment. Experimental results demonstrate that the proposed scheme can reduce average completion time by 8% to 46%, and provide higher I/O bandwidth than that of default data access strategies adopted by PVFS2 for heavy I/O workloads. Experimental results also show that the server-side I/O coordination scheme has good scalability.

high performance distributed computing | 2011

A cost-intelligent application-specific data layout scheme for parallel file systems

Huaiming Song; Yanlong Yin; Yong Chen; Xian-He Sun

I/O data access is a recognized performance bottleneck of high-end computing. Several commercial and research parallel file systems have been developed in recent years to ease the performance bottleneck. These advanced file systems perform well on some applications but may not perform well on others. They have not reached their full potential in mitigating the I/O-wall problem. Data access is application dependent. Based on the application-specific optimization principle, in this study we propose a cost-intelligent data access strategy to improve the performance of parallel file systems. We first present a novel model to estimate data access cost of different data layout policies. Next, we extend the cost model to calculate the overall I/O cost of any given application and choose an appropriate layout policy for the application. A complex application may consist of different data access patterns. Averaging the data access patterns may not be the best solution for those complex applications that do not have a dominant pattern. We then further propose a hybrid data replication strategy for those applications, so that a file can have replications with different layout policies for the best performance. Theoretical analysis and experimental testing have been conducted to verify the newly proposed cost-intelligent layout approach. Analytical and experimental results show that the proposed cost model is effective and the application-specific data layout approach achieved up to 74% performance improvement for data-intensive applications.

international parallel and distributed processing symposium | 2013

Pattern-Direct and Layout-Aware Replication Scheme for Parallel I/O Systems

Yanlong Yin; Jibing Li; Jun He; Xian-He Sun; Rajeev Thakur

The performance gap between computing power and the I/O system is ever increasing, and in the meantime more and more High Performance Computing (HPC) applications are becoming data intensive. This study describes an I/O data replication scheme, named Pattern-Direct and Layout-Aware (PDLA) data replication scheme, to alleviate this performance gap. The basic idea of PDLA is replicating identified data access pattern, and saving these reorganized replications with optimized data layouts based on access cost analysis. A runtime system is designed and developed to integrate the PDLA replication scheme and existing parallel I/O system; a prototype of PDLA is implemented under the MPICH2 and PVFS2 environments. Experimental results show that PDLA is effective in improving data access performance of parallel I/O systems.

ieee/acm international symposium cluster, cloud and grid computing | 2011

A Segment-Level Adaptive Data Layout Scheme for Improved Load Balance in Parallel File Systems

Huaiming Song; Yanlong Yin; Xian-He Sun; Rajeev Thakur; Samuel Lang

Parallel file systems are designed to mask the ever-increasing gap between CPU and disk speeds via parallel I/O processing. While they have become an indispensable component of modern high-end computing systems, their inadequate performance is a critical issue facing the HPC community today. Conventionally, a parallel file system stripes a file across multiple file servers with a fixed stripe size. The stripe size is a vital performance parameter, but the optimal value for it is often application dependent. How to determine the optimal stripe size is a difficult research problem. Based on the observation that many applications have different data-access clusters in one file, with each cluster having a distinguished data access pattern, we propose in this paper a segmented data layout scheme for parallel file systems. The basic idea behind the segmented approach is to divide a file logically into segments such that an optimal stripe size can be identified for each segment. A five-step method is introduced to conduct the segmentation, to identify the appropriate stripe size for each segment, and to carry out the segmented data layout scheme automatically. Experimental results show that the proposed layout scheme is feasible and effective, and it improves performance up to 163% for writing and 132% for reading on the widely used IOR and IOzone benchmarks.

cluster computing and the grid | 2012

Boosting Application-Specific Parallel I/O Optimization Using IOSIG

Yanlong Yin; Surendra Byna; Huaiming Song; Xian-He Sun; Rajeev Thakur

Many scientific applications spend a significant portion of their execution time in accessing data from files. Various optimization techniques exist to improve data access performance, such as data prefetching and data layout optimization. However, optimization process is usually a difficult task due to the complexity involved in understanding I/O behavior. Tools that can help simplify the optimization process have a significant importance. In this paper, we introduce a tool, called IOSIG, for providing a better understanding of parallel I/O accesses and information to be used for optimization techniques. The tool enables tracing parallel I/O calls of an application and analyzing the collected information to provide a clear understanding of I/O behavior of the application. We show that performance overheads of the tool in trace collection and analysis are negligible. The analysis step creates I/O signatures that various optimizations can use for improving I/O performance. I/O signatures are compact, easy-to-understand, and parameterized representations containing data access pattern information such as size, strides between consecutive accesses, repetition, timing, etc. The signatures include local I/O behavior for each process and global behavior for an overall application. We illustrate the usage of the IOSIG tool in data prefetching and data layout optimizations.

petascale data storage workshop | 2009

Data layout optimization for petascale file systems

Xian-He Sun; Yong Chen; Yanlong Yin

In this study, the authors propose a simple performance model to promote a better integration between the parallel I/O middleware layer and parallel file systems. They show that application-specific data layout optimization can improve overall data access delay considerably for many applications. Implementation results under MPI-IO middleware and PVFS2 file system confirm the correctness and effectiveness of their approach, and demonstrate the potential of data layout optimization in petascale data storage.

ieee international symposium on parallel & distributed processing, workshops and phd forum | 2013

EasyHPS: A Multilevel Hybrid Parallel System for Dynamic Programming

Jun Du; Ce Yu; Jizhou Sun; Chao Sun; Shanjiang Tang; Yanlong Yin

Dynamic programming approach solves complex problems efficiently by breaking them down into simpler sub-problems, and is widely utilized in scientific computing. With the increasing data volume of scientific applications and development of multi-core/multi-processor hardware technologies, it is necessary to develop efficient techniques for parallelizing dynamic programming algorithms, particularly in multilevel computing environment. The intrinsically strong data dependency of dynamic programming also makes it difficult and error-prone for the programmer to write a correct and efficient parallel program. In order to make the parallel programming easier and more efficient, we have developed a multilevel hybrid parallel runtime system for dynamic programming named EasyHPS based on the Directed Acyclic Graph(DAG) Data Driven Model in this paper. The EasyHPS system encapsulates details of parallelization implementation, such as task scheduling and message passing, and provides easy API for users to reduce the complexity of parallel programming parallelization. In the DAG Data Driven Model, the entire application is initially partitioned by data partitioning into sub-tasks that each sub-task processing a data block. Then all sub-tasks are modeled as a DAG, in which each vertex represents a sub-task and each edge indicates the communication dependency between the two sub-tasks. In task scheduling, a dynamic approach is proposed based on DAG Data Driven Model to achieve load balancing. Data partitioning and task scheduling are both done on processor-level and thread-level in the multilevel computing environment. In addition, experimental results demonstrate that the proposed dynamic scheduling approach in EasyHPS is more efficient in comparison with those static ones such as block-cyclic based wave front.

petascale data storage workshop | 2011

Pattern-aware file reorganization in MPI-IO

Jun He; Huaiming Song; Xian-He Sun; Yanlong Yin; Rajeev Thakur

Scientific computing is becoming more data-intensive; however I/O throughput is not growing at the same rate. MPI-IO and parallel file systems are expected to help bridge the gap by increasing data access parallelism. Compared to traditional I/O systems, some factors are more important in parallel I/O system in order to achieve better performance, such as the number of requests and contiguousness of accesses. The variation of these factors can lead to significant differences in performance. Programmers usually arrange data in a logical fashion for ease of programming and data manipulation; however, this may not be ideal for parallel I/O systems. Without taking into account the organization of file and behavior of the I/O system, the performance may be badly degraded. In this paper, a novel method of reorganizing files in I/O middleware level is proposed, which takes into account the access patterns. By placing data in a way favoring the parallel I/O system, gains of up to two orders of magnitudes in reading and up to one order of magnitude in writing were observed with spinning disks and solid-state disks.

international conference on parallel processing | 2014

Decoupled I/O for Data-Intensive High Performance Computing

Chao Chen; Yong Chen; Kun Feng; Yanlong Yin; Hassan Eslami; Rajeev Thakur; Xian-He Sun; William Gropp

The I/O bottleneck issue has been acknowledged as one of main performance issues of high performance computing (HPC) systems for data-intensive scientific applications, and has attracted intensive studies in recent years. With the enlarging gap between the computing bandwidth and I/O bandwidth in projected next-generation HPC systems, this issue will become even worse. In this paper, we present a novel decoupled I/O to address the fundamental I/O bottleneck issue. The decoupled I/O is a software stack including MPI extensions, compiler improvements, and runtime library support, based one decoupled HPC system architecture. It allows users to treat the computing of data-intensive operations and the traditional I/O operation as an ensemble and offload them into dedicated data nodes, which are near to the data source, to reduce the overhead of data movement and improve the I/O bandwidth usage. The decoupled I/O is user-friendly and requires little changes in application codes. Experiments were conducted to evaluate the performance of the decoupled I/O, and the results show that it outperforms existing solutions (such as active storage I/O) and provides an attractive I/O solution for data-intensive high performance computing.

international conference on cluster computing | 2014

SCALER: Scalable parallel file write in HDFS

Xi Yang; Yanlong Yin; Hui Jin; Xian-He Sun

Two camps of file systems exist: parallel file systems designed for conventional high performance computing (HPC) and distributed file systems designed for newly emerged data-intensive applications. Addressing the big data challenge requires an approach that utilizes both high performance computing and data-intensive computing power. Thus, HPC applications may need to interact with distributed file systems, such as HDFS. The N-1 (N-to-1) parallel file write is a critical technical challenge, because it is very common for HPC applications but HDFS does not allow it. This study introduces a system solution, named SCALER, which allows MPI based applications to directly access HDFS without extra data movement. SCALER supports N-1 file write at both the inter-block level and intra-block level. Experimental results confirm that SCALER achieves the design goal efficiently.

Explore More