Eng. Appl. Artif. Intell. | 2019

Reinforcement Learning based scheduling in a workflow management system

 
 
 

Abstract


Abstract Any computational process from simple data analytics tasks to training a machine learning model can be described by a workflow. Many workflow management systems (WMS) exist that undertake the task of scheduling workflows across distributed computational resources. In this work, we introduce a WMS that leverages machine learning to predict workflow task runtime and the probability of failure of task assignments to execution sites. The expected runtime of workflow tasks can be used to approximate the weight of the workflow graph branches in respect to the total workflow workload and the ability to anticipate task failures can discourage task assignments that are unlikely to succeed. We demonstrate that the proposed machine learning models can lead to significantly more informed scheduling decisions that minimize task failures and utilize execution sites more efficiently, thus leading to reduced workflow runtime. Additionally, we train a modified sequence-to-sequence neural network architecture via reinforcement learning to perform scheduling decisions as part of a WMS. Our approach introduces a WMS that can drastically improve its scheduling performance by independently learning over time, without external intervention or reliance on any specific heuristic or optimization technique. Finally, we test our approach in real-world scenarios utilizing computationally demanding and data intensive workflows and evaluate its performance against existing scheduling methodologies traditionally used in WMSes. The performance evaluation outcome confirms that the proposed approach significantly outperforms the other scheduling algorithms in a consistent manner and achieves the best execution runtime with the lowest number of failed tasks and communication costs.

Volume 81
Pages 94-106
DOI 10.1016/J.ENGAPPAI.2019.02.013
Language English
Journal Eng. Appl. Artif. Intell.

Full Text