IEEE Transactions on Parallel and Distributed Systems | 2021

A Two-Phase Dynamic Throughput Optimization Model for Big Data Transfers

 
 

Abstract


The amount of data transferred over dedicated and non-dedicated network links has been increasing much faster than the increase in the network capacity. On the other hand, the current data transfer solutions fail to guarantee even the promised achievable transfer throughput. In this article, we propose a novel two-phase dynamic throughput optimization model based on mathematical modeling with offline knowledge discovery/analysis and adaptive online decision making. In the offline analysis, we mine historical transfer logs to perform knowledge discovery about the transfer characteristics. The online phase uses the discovered knowledge from the offline analysis along with the real-time investigation of the network condition to optimize the protocol parameters. As the real-time investigation is expensive and provides partial knowledge about the current network status, our model uses historical knowledge about the network and data characteristics to reduce the real-time investigation overhead while ensuring near-optimal throughput for each transfer. Our novel approach is tested over different networks with different datasets, and it has outperformed its closest competitor by 1.7x and the default case by 5x. It also achieved up to 93 percent accuracy compared to the optimal achievable throughput possible on those networks.

Volume 32
Pages 269-280
DOI 10.1109/TPDS.2020.3012929
Language English
Journal IEEE Transactions on Parallel and Distributed Systems

Full Text