Parallel Comput. | 2021

Improved probabilistic I/O scheduling for limited-size Burst-Buffers deployed HPC

 
 

Abstract


Abstract I/O bottleneck is a critical problem in current High Performance Computing (HPC) systems which hinges the performance scalability of a system. Some techniques, such as I/O scheduling and Burst-Buffering, had been proposed to accelerate data exchange between the compute and storage components on HPC platforms. Probabilistic I/O scheduling, a Markov-chain-based hybrid method combined the above-mentioned two techniques, controls the data transmission considering the whole load states of the Burst-Buffers system to mitigate the I/O congestion caused by unpredictable concurrent I/O bursts. However, this method requires a large amount of computation to make online scheduling, resulting in significant wastage of computing resources and decreased efficiency in scheduling. In this paper, we first introduce the architecture of Burst-Buffers deployed HPC platform, the probabilistic execution model of applications, and the basic probabilistic I/O scheduling method with a proof of its efficiency based on the Markov-chain framework. Then, we propose the modularization technique, as the first improvement, to reduce the repeated computation by isolating the heuristic application selection module from the original method and reusing the application ranking result to adjust the I/O scheduling. Next, we propose the thresholding technique, as the second improvement, to reduce the number of data transferring on burst-buffers by considering the write amplification characteristic of the underlying storage devices. Finally, we conduct extensive simulation experiments to show that our proposed I/O scheduling methods outperform the existing I/O scheduling methods without introducing burst-buffers states and without considering the characteristics of storage devices.

Volume 101
Pages 102708
DOI 10.1016/j.parco.2020.102708
Language English
Journal Parallel Comput.

Full Text