IEEE Transactions on Information Theory | 2021

Hierarchical Coded Matrix Multiplication

 
 
 

Abstract


In distributed computing systems slow working nodes, known as stragglers, can greatly extend finishing times. Coded computing is a technique that enables straggler-resistant computation. Most coded computing techniques presented to date provide robustness by ensuring that the time to finish depends only on a set of the fastest nodes. However, while stragglers do compute less work than non-stragglers, in real-world commercial cloud computing systems (e.g., Amazon’s Elastic Compute Cloud (EC2)) the distinction is often a soft one. In this paper, we develop hierarchical coded computing that exploits the work completed by all nodes, both fast and slow, automatically integrating the potential contribution of each. We first present a conceptual framework to represent the division of work amongst nodes in coded matrix multiplication as a cuboid partitioning problem. This framework allows us to unify existing methods and motivates new techniques. We then develop three methods of hierarchical coded computing that we term bit-interleaved coded computation (BICC), multilevel coded computation (MLCC), and hybrid hierarchical coded computation (HHCC). In this paradigm, each worker is tasked with completing a sequence (a hierarchy) of ordered subtasks. The sequence of subtasks, and the complexity of each, is designed so that partial work completed by stragglers can be used, rather than ignored. We note that our methods can be used in conjunction with any coded computing method. We illustrate this by showing how we can use our methods to accelerate all previously developed coded computing techniques by enabling them to exploit stragglers. Under a widely studied statistical model of completion time, our approach realizes a 66% improvement in the expected finishing time. On Amazon EC2, the gain was 27% when stragglers are simulated.

Volume 67
Pages 726-754
DOI 10.1109/TIT.2020.3036763
Language English
Journal IEEE Transactions on Information Theory

Full Text