2019 IEEE International Symposium on Information Theory (ISIT) | 2019

Distributed Matrix Multiplication with MDS Array BP-XOR Codes for Scaling Clusters

 

Abstract


This study presents a novel coded computation technique for distributed matrix-matrix product computation at a massive scale that outperforms well known previous strategies in terms of total execution time. Our method achieves this performance by distributing the encoding operation over the cluster (slave) nodes at the expense of increased master-slave communication. The product computation is performed using MDS array Belief Propagation (BP)-decodable codes based on pure XOR operations. In addition, our scheme is configurable and suited for modern compute node architectures equipped with multiple processing units organized in a hierarchical manner. Assuming the number of backup nodes being sublinear in the size of the product, we shall demonstrate that the proposed scheme achieves order-optimal computation from an end-to-end latency perspective while ensuring acceptable communication requirements that can be addressed by today’s high speed network link infrastructures.

Volume None
Pages 1792-1796
DOI 10.1109/ISIT.2019.8849409
Language English
Journal 2019 IEEE International Symposium on Information Theory (ISIT)

Full Text