Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Kiyokuni Kawachiya.
international conference on performance engineering | 2018
Tung D. Le; Taro Sekiyama; Yasushi Negishi; Haruki Imai; Kiyokuni Kawachiya
The most important part of deep learning, training the neural network, often requires the processing of a large amount of data and can takes days to complete. Data parallelism is widely used for training deep neural networks on multiple GPUs in a single machine thanks to its simplicity. However, its scalability is bound by the number of data transfers, mainly for exchanging and accumulating gradients among the GPUs. In this paper, we present a novel approach to data parallel training called CPU-GPU data parallel (CGDP) training that utilizes free CPU time on the host to speed up the training in the GPUs. We also present a cost model for analyzing and comparing the performances of both the typical data parallel training and the CPU-GPU data parallel training. Using the cost model, we formally show why our approach is better than the typical one and clarify the remaining issues. Finally, we explain how we optimized CPU-GPU data parallel training by introducing chunks of layers and present a runtime algorithm that automatically finds a good configuration for the training. The algorithm is effective for very deep neural networks, which are the current trend in deep learning. Experimental results showed that we achieved speedups of
Archive | 2008
Takeshi Ogasawara; Akira Koseki; Hideaki Komatsu; Kiyokuni Kawachiya; Tamiya Onodera
1.21
Archive | 1998
Kiyokuni Kawachiya; Hiroshi Ishikawa
,
Archive | 2005
Kiyokuni Kawachiya; Kazunori Ogata; Tamiya Onodera; Trent A. Gray-Donald
1.04
Archive | 1993
Nobuyuki Ooba; Kiyokuni Kawachiya
,
Archive | 2001
Kiyokuni Kawachiya; Tamiya Onodera
1.21
Archive | 2009
Hiroshi Horii; Kiyokuni Kawachiya; Akira Koseki; Toshihiro Takahashi
and
Archive | 2011
Derek B. Inglis; Kiyokuni Kawachiya; Tamiya Onodera; Michiaki Tatsubori
1.07
Archive | 2008
Hiroshi Horii; Kiyokuni Kawachiya; Yosuke Ozawa; 洋 堀井; 陽介 小澤; 清久仁 河内谷
for four state-of-the-art neural networks: AlexNet, GoogLeNet-v1, VGGNet-16, and ResNet-152, respectively. Weak scaling efficiency greater than
Archive | 1999
Tamiya Onodera; Kiyokuni Kawachiya
90