Proceedings of the 3rd International Conference on High Performance Compilation, Computing and Communications | 2019

An effective method for operations placement in Tensor Flow

 
 
 
 
 
 

Abstract


Recent works in deep learning have shown that large neural networks can dramatically improve performance, followed by is the growth of computational requirements for hardware. To address those requirements, a common approach is to train those models on heterogeneous systems with a mixture of hardware devices such as CPUs and GPUs. Normally, the decision of putting parts of neural networks on devices is made by researchers based on heuristics algorithm. In this paper, we introduce an effective method to optimize operations placement for TensorFlow computational graphs on heterogeneous systems by using deep neural networks to predict devices for each operation in a target computational graph. Based on reinforcement learning, our method learns to group operations and assign each group to a corresponding device. To take advantage of the information of operations, we use a fully-connected network to group operations. In addition, we use the actual running time of the predictive placement as rewards to train the predictive network by using policy gradients. By executing the most widely used models in computer vision and machine translation, our method finds an optimized placement which outperforms human experts. When applying our method to the Neural Machine Translation model on the WMT14 German-English dataset, the execution time of per single training step reduces up to 28.41%.

Volume None
Pages None
DOI 10.1145/3318265.3318270
Language English
Journal Proceedings of the 3rd International Conference on High Performance Compilation, Computing and Communications

Full Text