2019 IEEE International Conference on Big Data (Big Data) | 2019

Performance Optimization of SpMV on Spark

 
 
 

Abstract


Sparse matrix-vector multiplication (SpMV) is one of the most important computational kernels in solving large scale numerical problems for scientific computing, data analysis, machine learning, and many others. However, its performance optimization on various platforms remains a research problem owing to the diversity of matrix structures and architectural properties. In this paper, we present two performance optimization methods for SpMV on Spark. First, we proposed a new data format, called Block COO plus (BCOO+), which can significantly reduce the number of shuffles in Spark. Second, we designed a new deep convolutional neural network (CNN) to analyze the matrix structure, and automatically choose the right data format for SpMV on Spark. The experimental results show that our method can achieve 3.2 times performance improvement comparing to traditional CSC format.

Volume None
Pages 689-694
DOI 10.1109/BigData47090.2019.9006323
Language English
Journal 2019 IEEE International Conference on Big Data (Big Data)

Full Text