[PDF] A Fundamental Storage-Communication Tradeoff for Distributed Computing with Straggling Nodes

Abstract

Placement delivery arrays for distributed computing (Comp-PDAs) have recently been proposed as a framework to construct universal computing schemes for MapReduce-like systems. In this work, we extend this concept to systems with straggling nodes, i.e., to systems where a subset of the nodes cannot accomplish the assigned map computations in due time. Unlike most previous works that focused on computing linear functions, our results are universal and apply for arbitrary map and reduce functions. Our contributions are as follows. Firstly, we show how to construct a universal coded computing scheme for MapReduce-like systems with straggling nodes from any given Comp-PDA. We also characterize the storage and communication loads of the resulting scheme in terms of the Comp-PDA parameters. Then, we prove an information-theoretic converse bound on the storage-communication (SC) tradeoff achieved by universal computing schemes with straggling nodes. We show that the information-theoretic bound matches the performance achieved by the coded computing schemes with straggling nodes corresponding to the Maddah-Ali and Niesen (MAN) PDAs, i.e., to the Comp-PDAs describing Maddah-Ali and Niesen's coded caching scheme. Interestingly, the same Comp-PDAs (the MAN-PDAs) are optimal for any number of straggling nodes, which implies that the map phase of optimal coded computing schemes does not need to be adapted to the number of stragglers in the system. We finally prove that while the points that lie exactly on the fundamental SC tradeoff cannot be achieved with Comp-PDAs that require smaller number of files than the MAN-PDAs, this is possible for some of the points that lie close to the SC tradeoff. For these latter points, the decrease in the requested number of files can be exponential in the number of nodes of the system.

Full PDF

11 A Fundamental Storage-CommunicationTradeoff for Distributed Computing withStraggling Nodes

Qifa Yan, Mich ` ele Wigger, Sheng Yang, and Xiaohu Tang Abstract

Placement delivery arrays for distributed computing (Comp-PDAs) have recently been proposed as a frameworkto construct universal computing schemes for MapReduce-like systems. In this work, we extend this concept tosystems with straggling nodes, i.e., to systems where a subset of the nodes cannot accomplish the assigned mapcomputations in due time. Unlike most previous works that focused on computing linear functions, our results areuniversal and apply for arbitrary map and reduce functions. Our contributions are as follows. Firstly, we show howto construct a universal coded computing scheme for MapReduce-like systems with straggling nodes from any givenComp-PDA. We also characterize the storage and communication loads of the resulting scheme in terms of theComp-PDA parameters. Then, we prove an information-theoretic converse bound on the storage-communication (SC)tradeoff achieved by universal computing schemes with straggling nodes. We show that the information-theoreticbound matches the performance achieved by the coded computing schemes with straggling nodes correspondingto the Maddah-Ali and Niesen (MAN) PDAs, i.e., to the Comp-PDAs describing Maddah-Ali and Niesen’s codedcaching scheme. Interestingly, the same Comp-PDAs (the MAN-PDAs) are optimal for any number of stragglingnodes, which implies that the map phase of optimal coded computing schemes does not need to be adapted to thenumber of stragglers in the system. We ﬁnally prove that while the points that lie exactly on the fundamental SCtradeoff cannot be achieved with Comp-PDAs that require smaller number of ﬁles than the MAN-PDAs, this ispossible for some of the points that lie close to the SC tradeoff. For these latter points, the decrease in the requestednumber of ﬁles can be exponential in the number of nodes of the system.

Index Terms

Distributed computing, storage, communication, straggler, MapReduce, placement delivery array.

Q. Yan and M. Wigger are with LTCI, T ´ el ´ ecom Paris, IP Paris, 91120 Palaiseau, France. E-mails: [email protected],[email protected]. Yang is with L2S, (UMR CNRS 8506), CentraleSup ´ elec-CNRS-Universit ´ e Paris-Sud, 91192 Gif-sur-Yvette, France. Email:[email protected]. Tang is with the Information Security and National Computing Grid Laboratory, Southwest Jiaotong University, 611756, Chengdu, Sichuan,China. Email: [email protected] of this work has been presented in ISIT 2019 [1]. The work of Q. Yan and M. Wigger has been supported by the ERC under grantagreement 715111. The work of X. Tang was supported in part by the National Natural Science Foundation of China under Grant 61871331. a r X i v : . [ c s . I T ] A p r I. I

NTRODUCTION

Distributed computing has emerged as one of the most important paradigms to speed up large-scale data analysistasks. One of the most popular programming models is MapReduce [2] which has been used to parallelizecomputations across distributed computing nodes, e.g., for machine learning tools [3], [4].Consider the task of computing D output functions from N ﬁles through K nodes. With MapReduce, each outputfunction φ d , for ≤ d ≤ D , can be decomposed into • N map functions f d, , . . . , f d, N , each depending on exactly one different ﬁle; and • a reduce function h d that combines the outputs of the N map functions.Each node k is responsible for computing a subset of DK output functions through three phases. In the ﬁrst mapphase , a central server stores a subset of ﬁles M k at node k , for each k ∈ [ K ] . Each node k then computes all the D intermediate values (IVAs) f ,n ( w n ) , . . . , f D ,n ( w n ) corresponding to each of its stored ﬁles w n ∈ M k . In thesubsequent shufﬂe phase , it creates a signal from its computed IVAs and sends the signal to all the other nodes.Based on the received exchanged signals and the locally computed IVAs, in the ﬁnal reduce phase it reconstructsall the IVAs pertaining to its own output functions and calculates the desired outputs.Recently, Li et al. proposed a scheme named coded distributed computing (CDC) to reduce the communicationload for data shufﬂing between the map and reduce phases [5]. The idea is to create multicast opportunities byduplicating the ﬁles and computing the corresponding map functions at different nodes. It is shown that the CDCscheme achieves the fundamental storage-communication tradeoff , i.e., it has the lowest communication load fora given storage constraint. This result has been extended in various directions. For example, [6]–[8] account alsofor the computation load during the map phase; [9] studies the computation resource-allocation problem; [10]–[13]consider wireless (noisy) networks between computation nodes; [14] considers a model where during the shufﬂephase each node broadcasts only to a random subset of the nodes.In this paper, we consider a setup where during the map phase each node takes a random amount of time tocompute its desired map functions [15]. In this case, instead of waiting for all the nodes to ﬁnish the assignedcomputations, which can cause an intolerable delay, data shufﬂing starts as soon as any set of Q nodes, Q ∈ [ K ] ,terminate their map procedures. The set of Q nodes that ﬁrst terminate the map procedure are called active nodes ,while the remaining K − Q nodes are called straggling nodes or stragglers . Note that the stragglers are not identiﬁedprior to the beginning of the map phase and the map phase has to be designed without such knowledge.Distributed computing systems with straggling nodes have mainly been studied in the context of a server-workerframework. In this framework, a central server distributes the raw data to the workers like in the above describedmap phase, but following this map phase the workers directly communicate their intermediate results to the server,which then produces the ﬁnal outputs. (Thus, under the server-worker framework, all ﬁnal outputs are calculatedat the server and not at the distributed computing nodes as is the case in MapReduce systems.) Under the server-worker framework, distributed computing systems with straggling nodes have for example been studied in [15]–[25],which focused on high-dimensional matrix-by-matrix or matrix-by-vector multiplications, and in [26]–[30], whichproposed codes for gradient computing. Fewer works studied MapReduce systems with straggling nodes (hereafter referred to as straggling systems)which are more relevant for the present article. Speciﬁcally, Li et al. [31] proposed to incorporate MDS codes intothe CDC scheme to cope with straggler nodes. Their construction however works only when the map functionsaccomplish matrix-by-vector multiplications. Improved constructions were proposed by Zhang et al. by choosingthe parameters of MDS code and CDC scheme separately in a more ﬂexible way [32], but also their techniquesare applicable only for map functions that are matrix-by-matrix multiplications. In many practical applicationssuch as computations in neural networks and machine learning, the map functions are non-linear and can be verycomplicated with little structure. This motivates us to investigate the MapReduce framework with straggling nodesfor general map and reduce functions. In particular, we will present universal coded computing schemes that canbe applied to arbitrary straggling systems, irrespective of the speciﬁc map and reduce functions. Moreover, we willshow the optimality of our schemes among the class of universal schemes that do not rely on special properties ofthe map and reduce functions.More speciﬁcally, in this work, we ﬁrst propose a systematic construction of universal coded computing schemesfor straggling systems from any placement delivery array for distributed computing (Comp-PDA) [8]. A placementdelivery array (PDA) is an array whose entries are either a special symbol “ ∗ ” or some integer numbers calledordinary symbols. It was introduced in [33] to represent in a single array both the placement and the delivery ofcoded caching schemes with uncoded prefetching. In particular, the coded caching schemes proposed by Maddah-Aliand Niesen in [34] can be represented as PDAs [33]. The corresponding PDAs will be referred to as MAN-PDAs,and, as we will see, they play a fundamental role also in coded computing with stragglers. PDAs have furtherbeen generalized to other coded caching scenarios such as device-to-device models [35], combination networks[36], networks with private demands [37], and medical data sharing problems [38]. Moreover, several differentPDA constructions have been proposed in [39]–[43]. In this paper our focus is on a subclass of PDAs, calledComp-PDAs, which were introduced in [8] to describe coded computing schemes for MapReduce systems withoutstraggling nodes. In this paper, we show that Comp-PDAs can also be used to construct coded computing schemesfor straggling systems, and we express the storage and computation loads of the obtained schemes in terms of theComp-PDA parameters.We then proceed to characterize the fundamental storage-communication (SC) tradeoff for straggling systems byshowing that the SC tradeoff curve achieved by coded computing schemes obtained from the MAN-PDAs matchesa new information-theoretic converse for universal computing schemes with stragglers. That means, our conversebounds the SC tradeoffs achieved by coded computing schemes that apply to arbitrary map and reduce functions.For special map and reduce functions, e.g., linear functions, it is possible to ﬁnd tailored coded computing schemesthat achieve better SC tradeoffs, than the one implied by our information-theoretic converse, see e.g., [32]. It isworth pointing out that the MAN-PDA based coded computing schemes adopt a ﬁxed storage strategy irrespectiveof the active set size Q . This implies that the fundamental SC-tradeoff curve remains unchanged even if the activeset size Q has not yet been determined during the map phase. The proposed schemes are thus optimal also ina scenario where the system imposes a strict time constraint proceeds to the shufﬂe and reduce phases with therandom number of nodes that have by then terminated their IVA calculations. In a ﬁnal part of the manuscript, we study the complexity of optimal (or near-optimal) coded computing schemes.In fact, a major practical limitation of the SC-optimal coded computing schemes based on MAN-PDAs is that theycan only be implemented if the number of ﬁles N in the system grows exponentially with the number of nodes K . However, as we show in this paper, in most cases, MAN-PDAs achieve their corresponding fundamental SCpairs with smallest possible number of ﬁles, i.e., with smallest ﬁle complexity , among all Comp-PDA based codedcomputing schemes. The mentioned practical limitation is thus not a weakness of the MAN-PDAs, but seemsinherent to PDA-based coded computing schemes for stragglers. Interestingly, the problem can be circumvented byslightly backing off from the SC-optimal tradeoff curve. We show that the coded computing schemes correspondingto some of the Comp-PDAs in [33] achieve SC pairs close to the fundamental SC tradeoff curve but with signiﬁcantlysmaller number of ﬁles than the optimal MAN-PDAs. More precisely, we ﬁx an integer q , let the number of nodes K be a multiple of q , and the storage load r be such that r K ∈ { q , q − q } holds. We compare the Comp-PDA in [33]and the MAN-PDA for such pairs ( K , r ) while we let both of them tend to inﬁnity proportionally. This comparisonshows that the ratio of the minimum required number of ﬁles of the Comp-PDA in [33] and the MAN-PDA vanishesas O (cid:16) e K (1 − q ) ln qq − (cid:17) , while the ratio of their communication loads approaches .We summarize the contributions of this paper:1) We establish a general framework for constructing universal coded computing schemes for straggling systemsfrom Comp-PDAs, and evaluate their SC pairs in terms of the Comp-PDA parameters.2) We derive the fundamental SC tradeoff for any universal straggling system by means of an informationtheoretic converse that matches the SC pairs achieved by coded computing schemes with stragglers based onthe MAN-PDAs.3) We prove that, while in most cases points on the fundamental SC tradeoff curve can be achieved only withthe same ﬁle complexity as MAN-PDA based schemes, points close to the fundamental SC tradeoff curvecan be achieved with signiﬁcantly smaller ﬁle complexities.The remainder of this paper is organized as follows. Section II formally describes our model, and Section IIIreviews the deﬁnitions of PDAs and Comp-PDAs; Section IV presents the main results of this paper; Sections Vto VII present the major proofs of our results; and Section VIII concludes this paper. Notations:

For positive integers n, k such that n ≥ k , we use the notations [ n ] (cid:44) { , , . . . , n } , and [ k : n ] (cid:44) { k, k + 1 , . . . , n } . The binomial coefﬁcient is denoted by C kn (cid:44) n ! k !( n − k )! for n ≥ k ≥ ; we set C kn = 0 when k < or k > n . For k ≤ n , we use Ω kn to denote the collection of all subsets of [ n ] of cardinality k , i.e., Ω kn (cid:44) {T ⊆ [ n ] : |T | = k } . The binary ﬁeld is denoted by F and the n dimensional vector space over F isdenoted by F n . We use |A| to denote the cardinality of the set A , while for a signal X , | X | is the number of bitsin X . The order of set operations is from left to right. Finally, ( · ) denotes the indicator function that evaluates to if the statement in the parenthesis is true and it evaluates to otherwise.II. S YSTEM M ODEL A ( K , Q ) straggling system is parameterized by the positive integers K , Q , N , D , U , V , W , as described in the following. The system aims to compute D output functions φ , . . . , φ D through K distributedcomputing nodes from N ﬁles. Each output function φ d : F NW → F U ( d ∈ [ D ] ) takes as inputs the length W ﬁlesin the library W = { w . . . , w N } , and outputs a bit stream of length U , i.e., u d = φ d ( w , . . . , w N ) ∈ F U . Assume that the computation of the output functions φ d can be decomposed as: φ d ( w , . . . , w N ) = h d ( f d, ( w ) , . . . , f d, N ( w N )) , where • the map function f d,n : F W → F V maps the ﬁle w n into a binary stream of length V , called intermediatevalue (IVA), i.e., v d,n (cid:44) f d,n ( w n ) ∈ F V , ∀ n ∈ [ N ]; • the reduce function h d : F NV → F U , maps the IVAs { v d,n } N n =1 into the output stream u d = φ d ( w , . . . , w N ) = h d ( v d, , . . . , v d, N ) . Notice that a decomposition into map and reduce functions is always possible. In fact, trivially, one can setthe map and reduce functions to be the identity and output functions respectively, i.e., f d,n ( w n ) = w n , and h d = φ d , ∀ n ∈ [ N ] , d ∈ [ D ] , in which case V = W . However, to mitigate the communication cost during theshufﬂe phase, one would prefer a decomposition such that the length of the IVAs is as small as possible while stillallowing the nodes to compute the ﬁnal outputs. The computation is carried out through three phases, namely, themap, shufﬂe, and reduce phases.1) Map Phase:

Each node k ∈ [ K ] stores a subset of ﬁles M k ⊆ W , and tries to compute all the IVAs fromthe ﬁles in M k , denoted by C k : C k (cid:44) { v d,n : d ∈ [ D ] , w n ∈ M k } . (1)Each node has a random amount of time to compute its corresponding IVAs. To limit latency of the system,the coded computing scheme proceeds with the shufﬂe and reduce phases as soon as a ﬁxed number of Q ∈ [ K ] nodes have terminated the map computations. These nodes are called active nodes , and the set of allactive nodes is called active set , whereas the other K − Q nodes are called straggling nodes . For simplicity, weconsider the symmetric case in which each subset Q ⊆ [ K ] of size |Q| = Q is active with same probability.Let the random variable Q denote the random active set. Then, Pr { Q = Q} = 1 C QK , ∀ Q ∈ Ω QK . In our model, we also assume that the map phase has been designed in a way that all the ﬁles can be recovered from any active set of size Q . Hence, for any ﬁle w n ∈ W , the number of nodes storing this ﬁle t n must satisfy t n ≥ K − Q + 1 , ∀ n ∈ [ N ] . (2)The output functions φ , . . . , φ D are then uniformly assigned to the nodes in Q . Let D Q k be the set of indicesof output functions assigned to a given node k ∈ Q . Thus, Γ Q (cid:44) (cid:8) D Q k (cid:9) k ∈ Q forms a partition of [ D ] , andeach set D Q k is of cardinality DQ . Denote the set of all the partitions of [ D ] into Q equal-sized subsets by ∆ .2) Shufﬂe Phase:

The nodes in Q proceed to exchange their computed IVAs. Each node k ∈ Q multicasts asignal X Q k = ϕ Q k (cid:0) C k , Γ Q (cid:1) to all the other nodes in Q . For each k ∈ Q , here ϕ Q k : F |C k | V × ∆ → F | X Q k | denotes the encoding functionof node k .We assume a perfect multicast channel, i.e., each active node k ∈ Q receives perfectly all the transmittedsignals X Q (cid:44) (cid:8) X Q k : k ∈ Q (cid:9) . Reduce Phase:

Using the received signals X Q from the shufﬂe phase and the local IVAs C k computed inthe map phase, node k has to be able to compute all the IVAs { ( v d, , . . . , v d, N ) } d ∈D Q k = ψ Q k (cid:0) X Q , C k , Γ Q (cid:1) , where ψ Q k : F (cid:80) k ∈ Q | X Q k | × F |C k | V × ∆ → F NDVQ . Finally, with the restored IVAs, it computes each assignedfunction via the reduce function, namely, u d = h d ( v d, , . . . , v d, N ) , ∀ d ∈ D Q k . To measure the storage and communication costs, we introduce the following deﬁnitions.

Deﬁnition 1 (Storage Load) . Storage load r is deﬁned as the total number of ﬁles stored across the K nodesnormalized by the total number of ﬁles N , i.e., r (cid:44) (cid:80) Kk =1 |M k | N . Deﬁnition 2 (Communication Load) . Communication load L is deﬁned as the average total number of bits sent in In this paper, we thus exclude the “outage” event in which some active sets cannot compute the given function due to missing ﬁles. Here we assume for simplicity that Q divides D . Note that otherwise we can always add empty functions for the assumption to hold. the shufﬂe phase, normalized by the total number of bits of all intermediate values, i.e., L = E (cid:34) (cid:80) k ∈ Q | X Q k | NDV (cid:35) , (3) where the expectation is taken over the random active set Q . Deﬁnition 3 (Storage-Communication (SC) Tradeoff) . A pair of real numbers ( r, L ) is achievable if for any (cid:15) > ,there exist positive integers N , D , U , V , W , a storage design {M k } K k =1 of storage load less than r + (cid:15) , a set ofuniform assignments of output functions (cid:8) Γ Q (cid:9) Q∈ Ω QK , and a collection of encoding functions (cid:8)(cid:8) ϕ Q k (cid:9) k ∈Q (cid:9) Q∈ Ω QK with communication load less than L + (cid:15) , such that all the output functions φ , . . . , φ D can be computed successfully.For a ﬁxed Q ∈ [ K ] , we deﬁne the fundamental storage-communication (SC) tradeoff as L ∗ K , Q ( r ) (cid:44) inf { L : ( r, L ) is achievable } . Note that the non-trivial interval for the values of r is [ K − Q + 1 , K ] . Indeed, if r > K , then each node canstore all the ﬁles and compute any function locally. On the other hand, from the assumption (2), we have for anyfeasible scheme r = (cid:80) K k =1 |M k | N = (cid:80) N n =1 t n N ≥ K − Q + 1 . Therefore, throughout the paper, we only focus on the interval r ∈ [ K − Q + 1 , K ] for any given Q ∈ [ K ] .Further, for a given storage design {M k } K k =1 , by the symmetry assumption of the reduce functions and the factthat each node has all the IVAs of all D output functions in the ﬁles it has stored, the optimal communicationload is independent of the reduce function assignment. This is similar to the case without straggling nodes (see [5,Remark 3]). Deﬁnition 4 (File Complexity) . The smallest number of ﬁles N required to implement a given scheme is called theﬁle complexity of this scheme. In above problem deﬁnition, the various nodes store entire ﬁles during the map phase and during the shufﬂephase they reconstruct all the IVAs corresponding to their output functions. This system deﬁnition does not allow toreduce storage or communication loads by exploiting special structures of the map or reduce functions as proposedfor example in [31], [32]. As a consequence, all the coded computing schemes presented in this paper universallyapply to arbitrary map and reduce functions and the SC tradeoff in Deﬁnition 3 applies only to such universalschemes. In fact, as we will explain, for linear reduce functions the SC tradeoff derived in [32] improves overthe one in Deﬁnition 3, since it was derived for a system where nodes do not have to store individual ﬁles andreconstruct all the required IVAs, but linear combinations of them sufﬁce.

III. P

LACEMENT D ELIVERY A RRAYS FOR S TRAGGLING S YSTEMS

A. Deﬁnitions

Placement delivery arrays (PDA) introduced in [33] are the main tool of this paper. To adapt to our setup, weuse the following deﬁnition from [8].

Deﬁnition 5 (Placement Delivery Array (PDA)) . For positive integers K , F , T and a nonnegative integer S , an F × K array A = [ a j,k ] , j ∈ [ F ] , k ∈ [ K ] , composed of T special symbols “ ∗ ” and some ordinary symbols , . . . , S ,each occurring at least once, is called a ( K , F , T , S ) PDA, if for any two distinct entries a j ,k and a j ,k , we have a j ,k = a j ,k = s with s an ordinary symbol only if a) j (cid:54) = j , k (cid:54) = k , i.e., they lie in distinct rows and distinct columns; and b) a j ,k = a j ,k = ∗ ,i.e., the corresponding × subarray formed by rows j , j and columns k , k must be of the following form  s ∗∗ s  or  ∗ ss ∗  . A PDA with all “ ∗ ” entries is called trivial. Notice that in this case S = 0 and KF = T . A PDA is called a g -regular PDA if each ordinary symbol occurs exactly g times.Example . The following array is a -regular (4 , , , PDA. A =  ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗  . For our purpose, we introduce the following deﬁnitions similarly to the ones in [8].

Deﬁnition 6 (PDA for Distributed Computing (Comp-PDA)) . A Comp-PDA is a PDA with at least one “ ∗ ” -symbolin each row. Deﬁnition 7 (Minimum Storage Number) . Given a Comp-PDA A , its minimum storage number τ is deﬁned asthe minimum number of “ ∗ ” -symbols in any of the rows of A . Deﬁnition 8 (Symbol Frequencies) . For a given nontrivial ( K , F , T , S ) Comp-PDA, let S t denote the number ofordinary symbols that occur exactly t times, for t ∈ [ K ] . The symbol frequencies θ , θ , . . . , θ K of the Comp-PDAare then deﬁned as θ t (cid:44) S t t KF − T , t ∈ [ K ] . Fig. 1: An example of CCS scheme for a system with K = 4 , N = 6 and Q = 3 , where the third node is a straggling node. They indicate the fractions of ordinary entries of the Comp-PDA that occur exactly , , . . . , K times, respectively.For completeness, we also deﬁne θ t (cid:44) for t > K .B. Constructing a Coded Computing Scheme from a Comp-PDA: A Toy Example In this subsection we illustrate the connection between Comp-PDAs and coded computing schemes with stragglersat hand of a toy example. Section V ahead describes a general procedure to obtain a coded computing scheme withstragglers from any Comp-PDA, and it presents a performance analysis for the obtained scheme.Consider the (4 , , , Comp-PDA A in Example 1, and assume a ( K , Q ) = (4 , straggling system with N = 6 ﬁles and D = 3 output functions. The scheme is illustrated in Fig. 1 for the case that node is straggling. Inthis Fig. 1, the line “ﬁles” in each of the four boxes indicates the ﬁles stored at the nodes. The remaining lines in theboxes illustrate the computed IVAs, where red circles, green triangles, and blue squares depict IVAs pertaining tooutput functions φ , φ , and φ , respectively. More speciﬁcally, a red circle with the number i in the middle standsfor IVA v i, , and so on. The lines below the boxes of the active nodes 1, 2, and 4 indicate the IVAs that the nodeshave to learn to be able to compute their output functions. In this example it is assumed that node 1 computes φ ,node 2 computes φ , and node computes φ . The signals on the left/right side of the boxes indicate the signalssent by the nodes. Here, splitting of IVAs indicates that the IVA is decomposed into a substring consisting of theﬁrst half of the bits and a substring consisting of the second half of the bits, and the plus symbol stands for abit-wise XOR-operation on the substrings.We now explain the distributed coding scheme associated with the PDA A more formally. We start by associatingcolumn k of A with node k in the system, ( k ∈ [4] ), and row j of A with ﬁle w j in the system, ( j ∈ [6] ). In themap phase, node k stores ﬁle w j if the row- k and column- j entry of A is a “ ∗ ” -symbol. For example, node ,which is associated with the ﬁrst column of the Comp-PDA, stores ﬁles w , w and w . Each node then computesall the IVAs corresponding to the ﬁles it has stored.In our example, we assume that node is the only straggler. Nodes 1, 2, and 4 thus form the active set andas such continue with the shufﬂe and reduce procedures. Accordingly, we extract from the PDA A the subarray A { , , } consisting of columns , and (the columns corresponding to the active set): A { , , } =  ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗  . Notice that A { , , } is also a Comp-PDA (in particular it has at least one “ ∗ ” symbol in each row) and thenode corresponding to a given column has stored all the ﬁles indicated by the “ ∗ ” -symbols in this column. Thesame statement applies also to the subarrays associated with any other possible active set of size . After theshufﬂing phase, we are thus in the same situation as described in [8], [44] when a coded computing scheme withoutstragglers is to be constructed from a Comp-PDA, and as a consequence, the same shufﬂe and reduce procedurescan be applied. We described these procedures here in detail for completeness.The shufﬂe phase is as follows. For each s ∈ { , , , } occuring g times ( g = 2 or ), pick out the g × g arraycontaining s . For example, symbol s = 2 is associated with the following 3-by-3 subarray: w ∗ ∗ w ∗ ∗ w ∗ ∗ (4)Each occurence of the symbol “2” in this subarray stands for an IVA desired by the node in the correspondingcolumn and computed at the other nodes in this subarray. The row of the symbol indicates to which ﬁle this IVApertains, and the “ ∗ ” symbols in this row indicate that the IVA can indeed be computed by all nodes in the activeset except for the one corresponding to the column of the “2” symbol. In the above example, the three “2” symbolsfrom top to down represent the IVAs v , , v , , and v , , respectively. These IVAs are shufﬂed in a coded manner.To this end, they are ﬁrst split into g − equally-large sub-IVAs, and each of these sub-IVAs is labeled byone of the nodes where the IVA has been computed (i.e,. by the columns with “ ∗ ” symbols). In our example,we split v , = ( v , , v , ) , v , = ( v , , v , ) and v , = ( v , , v , ) . The signal sent by a given node i is thensimply the componentwise XOR of the sub-IVAs with superscript i . So, nodes , , send v , ⊕ v , , v , ⊕ v , and v , ⊕ v , , respectively. The same procedure is applied for all other ordinary symbols , , and in subarray A { , , } . The following table lists all the signals sent at the 4 nodes, where the ﬁrst line lists their associated ordinary symbols: Symbol Node v , v , ⊕ v , v , Node v , v , ⊕ v , v , Node (straggling)Node v , ⊕ v , v , v , (5)We now explain how the nodes extract their missing IVAs from the shufﬂed signals. Since node has computed v , , v , and v , in the map phase, it still needs to decode v , , v , , v , . It directly obtains the IVAs v , and v , from the uncoded signals sent by nodes and respectively. Moreover, it reconstructs the two sub-IVAs v , and v , , by XORing the signals v , ⊕ v , and v , ⊕ v , shufﬂed by nodes and with its locally stored sub-IVAs v , and v , . Nodes and reconstruct their missing IVAs in a similar way.The total number of stored ﬁles at the nodes is × , thus the storage load is r = × KN = 2 . The totallength of the transmitted signals is . V , which remains unchanged also when any of the other nodes straggles.The communication load is thus L = . V × × V = .IV. M AIN R ESULTS

In this section, we present our main results. Details and proofs are deferred to Sections V–VII.

A. Coded Computing Schemes for Straggling Systems from Comp-PDAs

In Section V, we propose a coded computing scheme for a ( K , Q ) straggling system based on any Comp-PDAwith K columns and minimum storage number τ ≥ K − Q + 1 . Theorem 1 is proved by analyzing the codedcomputing scheme, which is deferred to Section V-B. Theorem 1.

From any given ( K , F , T , S ) Comp-PDA A with symbol frequencies { θ t } K t =1 and minimum storagenumber τ ∈ [ K − Q + : K ] , one can construct a coded computing scheme for a ( K , Q ) straggling system achievingthe SC pair r A = TF ,L A = (cid:18) − TFK (cid:19) · C Q − K − · K (cid:88) t =1 θ t  C Q − K − t + min { t, Q }− (cid:88) l =max { ,t − K + Q − } l · C lt − · C Q − l − K − t  , (6) with ﬁle complexity F . Theorem 1 characterizes the performance of the coded computing scheme obtained from a Comp-PDA asdescribed in Section V in terms of the Comp-PDA parameters. In the following, we will simply say that a Comp-PDAachieves this performance.Notice that the ﬁle complexity of any Comp-PDA based scheme coincides with the number of rows F of theComp-PDA. We shall therefore call the parameter F of a Comp-PDA its ﬁle complexity. As we show in the following, Theorem 1 can be simpliﬁed for regular Comp-PDAs.

Corollary 1.

From any given g -regular ( K , F , T , S ) Comp-PDA A , with g ∈ [ K ] and minimum storage number τ ∈ [ K − Q + : K ] , one can construct a coded computing scheme for a ( K , Q ) straggling system achieving threSC pair r A = TF ,L A = (cid:18) − TKF (cid:19) ·  C Q − K − g C Q − K − + min { g, Q }− (cid:88) l =max { ,g − K + Q − } l · C lg − · C Q − l − K − g C Q − K −  , with ﬁle complexity F .Proof: From Theorem 1, we only need to evaluate L A when A is a g - ( K , F , T , S ) Comp-PDA. In this case,all the S symbols occur g times, i.e., θ g = 1 , and θ t = 0 , ∀ t ∈ [ K ] \{ g } . Then the conclusion directly follows from Theorem 1.Corollary 1 is of particular interest since there are several explicit regular PDA constructions for coded cachingin the literature, such as [33], [42], [43], which are also Comp-PDAs. In particular, the following PDAs obtainedfrom the coded caching scheme proposed by Maddah-Ali and Niesen [34] are important.

Deﬁnition 9 (Maddah-Ali Niesen PDA (MAN-PDA)) . Fix any integer i ∈ [ K ] , and let {T j } C i K j =1 denote all subsets of [ K ] of size i . Also, choose an arbitrary bijective function κ from the collection of all subsets of [ K ] with cardinality i + 1 to the set (cid:2) C i +1 K (cid:3) . Then, deﬁne the array P i = [ p j,k ] as p j,k (cid:44)  ∗ , if k ∈ T j κ ( { k } ∪ T j ) , if k / ∈ T j . We observe that for any i ∈ [ K − , the array P i is an ( i + 1) -regular (cid:0) K , C i K , K C i − K − , C i +1 K (cid:1) Comp-PDA(see [33] for details). For i = K , the Comp-PDA P i consists only of “ ∗ ”-entries and is thus a trivial PDA. ByCorollary 1, we directly obtain the following result. Corollary 2.

For a ( K , Q ) straggling system, and each r in the discrete set [ K − Q + 1 : K ] , the MAN-PDA P r achieves the SC pair ( r, L P r ) , where L P r (cid:44) (cid:16) − r K (cid:17) · min { r, Q − } (cid:88) l = r + Q − K l · C lr · C Q − l − K − r − C Q − K − . The coded computing scheme associated to P r is equivalent to our proposed coded computing for stragglingsystems (CCS) in [1]. Here, we present it as a special case of the more general Comp-PDA framework. As we shallsee, the Comp-PDA framework allows us to design new coded computing schemes with lower ﬁle complexity. B. The Fundamental Storage-Communication Tradeoff

We are ready to present our result on the fundamental SC tradeoff, which is proved in Section VI.

Theorem 2.

For a ( K , Q ) straggling system, with a given integer storage load r in the discrete set [ K − Q + 1 : K ] ,the fundamental SC tradeoff is L ∗ K , Q ( r ) = (cid:16) − r K (cid:17) · min { r, Q − } (cid:88) l = r + Q − K l · C lr · C Q − l − K − r − C Q − K − , r ∈ [ K − Q + 1 : K ] , (7) which is achievable with a scheme of ﬁle complexity C r K . For a general r in the interval [ K − Q + 1 , K ] , thefundamental SC tradeoff L ∗ K , Q ( r ) is given by the lower convex envelope formed by the above points in (7) . Fig. 2 shows the fundamental SC tradeoff curves for K = 10 and different values of Q . When Q = 1 , the curvereduces to a single point ( K , , while when Q = K , the curve corresponds to the fundamental tradeoff withoutstraggling nodes (cf. [5, Fig. 1]). In this latter case without stragglers, the fundamental SC tradeoff curve is achievedby the CDC scheme in [5]. For a general value of Q and integer storage r ∈ [ K − Q + 1 : K ] , the fundamental SCtradeoff pair ( r, L ∗ K , Q ( r )) is achieved by the MAN-PDA P r , see Corollary 2. This implies that for a ﬁxed integerstorage load r ∈ [1 : K ] , the SC pairs { ( r, L ∗ K , Q ( r )) } K Q = K − r +1 are all achieved by the same PDA P r , irrespectiveof the size of the active set Q . As we show in Section V-A, the map procedures of the coded computing schemecorresponding to a given Comp-PDA at a given node k only depends on the “ ∗ ” -symbols in the k -th column ofthe PDA. Therefore, all the points on the fundamental SC tradeoff curve with same integer storage load r can beattained with the same map procedures described by the MAN-PDA P r . (See also Remark 3 in Section V-A.)As a consequence, the fundamental SC-tradeoff points that have integer storage load r ∈ [1 , K ] remain achievable(and optimal) also in a related setup where the size of the active set Q is unknown during the map procedure. Bysimple time and memory-sharing arguments, this conclusion extends to all points on the fundamental SC tradeoffcurve with arbitrary real-valued storage loads r ∈ [1 , K ] . Related is also the scenario where instead of ﬁxing thesize of the active set Q , the system imposes a hard time-limit for the map phase and proceeds to the shufﬂe andreduce phases with the (random) number of nodes that have terminated within due time. For given storage load r ,the MAN-PDA based coded computing scheme promises that when Q ≥ (cid:100) K + 1 − r (cid:101) nodes have terminated duringthe map phase, all IVAs are computed at least once and thus the system can proceed to data shufﬂing, and achievesthe minimum required communication load L ∗ K , Q ( r ) . When only Q < (cid:100) K + 1 − r (cid:101) nodes have terminated, someIVAs are not computed, and hence the system can not proceed.It is further worth pointing out that all our PDA based coded computing schemes are universal and achievethe same performance for any choice of map and reduce functions. No structure is assumed on these functions.Similarly, our information-theoretic converse applies only to such universal coded computing schemes. If the mapor reduce functions have certain properties, for example, linearity, it is possible to achieve better SC tradeoffs bystoring combinations of ﬁles instead of each ﬁle separately [31], [32]. Fig. 3 compares Theorem 2 to the resultsin [31], [32]. It can be observed that the MAN-PDA based scheme outperforms the scheme in [31] but is inferiorto the improved version in [32]. As already mentioned, the scheme in [32] however works only for linear map Fig. 2: Storage-Communication Tradeoff L ∗ K , Q ( r ) for Q ∈ [ K ] when K = 10 . functions, and not for arbitrary functions as our schemes. Another advantage of our schemes is that they work overthe binary ﬁeld, and are thus easier to implement than the MDS-based schemes in [31], [32]. which require a largeenough ﬁeld size. Fig. 3: Comparison with known results when applied to linear map functions, K = 10 , Q = 8 . C. Optimality and Reduction of File Complexity

From Theorem 1 and Corollary 2, the coded computing scheme based on the MAN-PDA P r , for r ∈ [ K − Q + : K ] , has ﬁle complexity F = C r K and achieves the fundamental SC tradeoff. The following theorem indicates that, thisis the smallest ﬁle complexity to achieve the same tradeoff point in most cases. The proof is deferred to Section VII. Theorem 3.

For a ( K , Q ) straggling system, if a Comp-PDA based scheme achieves the fundamental SC tradeoff (cid:0) r, L ∗ K , Q ( r ) (cid:1) for some r ∈ [ K − Q + 1 : K ] , if Q / ∈ { , K } or r (cid:54) = K − Q + 1 , then the ﬁle complexity F ≥ C r K .Remark . It is easy to verify that, in the case Q ∈ { , K } and r = K − Q + 1 , the fundamental SC tradeoff canbe achieved with F = 1 with the Comp-PDAs [ ∗ , ∗ , . . . , ∗ , and [ ∗ , , , . . . , K − , respectively.We next present Comp-PDAs with lower ﬁle complexity F that achieve SC tradeoffs close to the optimal ones.We consider two existing PDA constructions originally proposed for coded caching in [33, Theorems 4 and 5]. Let q ∈ [2 : K − be such that q | K , and m = K q . There exists P ) an m -regular (cid:0) mq, q m − , mq m − , ( q − q m − (cid:1) Comp-PDA with minimum storage number m ;P ) an m ( q − -regular (cid:0) mq, ( q − q m − , m ( q − q m − , q m − (cid:1) Comp-PDA with minimum storage number m ( q − . Corollary 3.

For any integer r ∈ [ K − Q + 1 : K − , such that either r | K or ( K − r ) | K , the communication load L K , Q ( r ) = (cid:16) − r K (cid:17) ·  C Q − K − r C Q − K − + min { r, Q }− (cid:88) l =max { ,r + Q − K − } l · C lr − C Q − l − K − r C Q − K −  , (8) can be achieved with ﬁle complexity F = r K · (cid:16) K min { r, K − r } (cid:17) min { r, K − r } .Proof: If r | K , then specialize the Comp-PDA in P1) to parameter q = K r . This results in a r -regular (cid:0) K , q r − , K q r − , ( q − q r − (cid:1) Comp-PDA with minimum storage number r , and the proof is then immediate fromCorollary 1. If K − r | K , then specialize the Comp-PDA in P2) to parameter q = KK − r . This results in a r -regular (cid:0) K , ( q − q K − r − , K ( q − q K − r − , q K − r − (cid:1) Comp-PDA, and the proof again follows from Corollary 1.In the following proposition, we quantify how close the above SC tradeoff point is to the optimal, and by howmuch we can reduce the ﬁle complexity.

Proposition . Consider a ( K , Q ) straggling system and an integer r ∈ [ K − Q + 1 : K ] such that r K = c ∈ (cid:8) q , q − q (cid:9) for some integer q ∈ [2 : K − . There exist α ∈ [0 , and β ∈ [0 , √ πe ] , such that the SC tradeoff L K , Q ( r ) andthe ﬁle complexity F achieved by constructions P1) or P2) above satisfy L K , Q ( r ) L ∗ K , Q ( r ) = 1 + αr , and FF ∗ = βA q K B − K q , where A q (cid:44) √ q − cq and B q (cid:44) (cid:16) qq − (cid:17) q − q .The proof is given in Appendix B. From the above proposition, for a ﬁxed integer q , whenever rK ∈ { q , q − q } and K , r scale proportionally to inﬁnity, the communication load is close to optimal, while the ﬁle complexity canbe reduced by a factor that increases exponentially in K . Remark . In this work, we only consider two particular PDAs. There has been extensive research in codedcaching schemes with low subpacketization level using various approaches. Most of them have an equivalent PDArepresentation. For examples, PDAs can be constructed from hyper-graphs [42], bipartite graphs [43], linear blockcodes [45], Ruzsa-Szemer ´ edi graphs [46]. The result in Theorem 1 makes it possible to apply all these resultsstraightforwardly to straggling systems.V. C ODED C OMPUTING S CHEMES FOR S TRAGGLING S YSTEMS FROM C OMP -PDA S (P ROOF OF T HEOREM A. Constructing a Coded Computing Scheme for a Straggling System from a Comp-PDA

In [8], we described how to obtain a coded computing scheme without stragglers from any given Comp-PDA. Asimilar procedure is possible in the presence of stragglers if the minimum storage number τ ≥ K − Q + 1 . In fact,assume a given Comp-PDA A . The storage design in the map phase corresponding to A is the same as withoutstraggling nodes. As part of the map phase, each node computes all the IVAs that it can compute from its storedﬁles. For the reduce phase of the straggling system, we restrict to the subarray A Q of A formed by the columns of A with indices in the active set Q . Notice that A Q is again a Comp-PDA, because the minimum storage numberof A is at least K − Q + 1 and after eliminating K − Q columns from A each row still contains at least one “ ∗ (cid:48)(cid:48) symbol. Shufﬂe and reduce phases are performed as in a non-straggling setup, see [8], but where the Comp-PDA A is replaced by the new Comp-PDA A Q . For completeness, we explain the map, shufﬂe, and reduce phases indetail.Fix a ( K , F , T , S ) Comp-PDA A = [ a i,j ] with minimum storage number τ ≥ K − Q + 1 . Partition the N ﬁlesinto F batches W , W , . . . , W F , each containing η (cid:44) NF ﬁles and so that W , W , . . . , W F form a partition for W . It is implicitly assumed here that η is an integer number.

1) Map Phase:

Each node k stores the ﬁles in M k = (cid:91) i ∈ [ F ] : a i,k = ∗ W i , (9)and computes the IVAs in (1). The map phase terminates whenever any Q nodes accomplish their computations.Throughout this section, let Q = Q be the realization of the active set. Then, A Q denotes the subarray of A composed of the columns in Q . Also, let g Q s denote the number of occurrences of the symbol s in A Q , i.e., g Q s = |{ ( i, k ) : a i,k = s, k ∈ Q}| , and I Q be the set of symbols occuring only once in A Q : I Q (cid:44) { s ∈ [ S ] : g Q s = 1 } . The symbols in I Q are partitioned into Q subsets {I Q k : k ∈ Q} as follows. For each s ∈ I Q , let ( i, j ) be theunique pair in [ F ] × Q such that a i,j = s . Since the number of “ ∗ ” symbols in the i -th row of A is equal or largerthan K − Q + 1 by the assumption τ ≥ K − Q + 1 , there exists at least one k ∈ Q such that a i,k = ∗ . Arbitrarilychoose such a k and assign s into I Q k .Let A Q k denote the set of ordinary symbols in column k occurring at least twice: A Q k (cid:44) { s ∈ [ S ] : a i,k = s for some i ∈ [ F ] }\I Q , k ∈ Q . (10)Pick any uniform assignment of reduce functions D Q = {D Q k } k ∈Q . Let U Q i,j denote the set of IVAs for node j computed from the ﬁles in W i , i.e., U Q i,j (cid:44) (cid:8) v d,n : d ∈ D Q j , w n ∈ W i (cid:9) , ( i, j ) ∈ [ F ] × Q .

2) Shufﬂe Phase:

Node k multicasts the signal X Q k = (cid:8) X Q k,s : s ∈ I Q k ∪ A Q k (cid:9) , where the signals X Q k,s are created as described in the following, depending on whether s ∈ I k or s ∈ A Q k . For all s ∈ I k , set X Q k,s (cid:44) U Q i,j , s ∈ I k , (11)where ( i, j ) is the unique index in [ F ] × Q such that a i,j = s .To describe the signal X Q k,s for s ∈ A Q k , we ﬁrst describe a partition of the IVA U Q i,j for each pair ( i, j ) ∈ [ F ] × Q such that a i,j ∈ A Q j . Let s (cid:48) = a i,j , then g Q s (cid:48) ≥ . Let ( l , j ) , ( l , j ) , . . . , ( l g Q s − , j g Q s − ) ∈ [ F ] × Q indicate all theother g Q s (cid:48) − occurrences of the ordinary symbol s (cid:48) in A Q , i.e., a l ,j = a l ,j = . . . = a l g Q s − ,j g Q s − = s (cid:48) . Partition the set of IVAs U Q i,j into g Q s (cid:48) − subsets of equal size and denote these subsets by U Q ,j i,j , U Q ,j i,j , . . . , U Q ,j g Q s − i,j : U Q i,j = (cid:26) U Q ,j i,j , U Q ,j i,j , . . . , U Q ,j g Q s − i,j (cid:27) . (12)For all s ∈ A Q k , set X Q k,s (cid:44) (cid:77) ( i,j ) ∈ [ F ] × ( Q\{ k } ): a i,j = s U Q ,ki,j , s ∈ A Q k . (13)

3) Reduce Phase:

Node k computes all IVAs in (cid:91) i ∈ [ F ] U Q i,k . In the map phase, node k has already computed all IVAs in (cid:8) U Q i,k : a i,k = ∗ (cid:9) . It thus remains to compute all IVAsin (cid:91) i ∈ [ F ]: a i,k (cid:54) = ∗ U Q i,k . Fix an arbitrary i ∈ [ F ] such that a i,k (cid:54) = ∗ , and set s = a i,k . If s ∈ A Q k , each subblock U Q ,ji,k in (12) can be restoredby node k from the signal X Q j,s sent by node j (see (13)): U Q ,ji,k = U Q ,jl ,j ⊕ U Q ,jl ,j ⊕ . . . ⊕ U Q ,jl g Q s − ,j g Q s − ⊕ X Q j,s , (14) where ( l t , j t ) ( t ∈ [ g Q s − ) indicate the other g Q s − occurrences of the symbol s in A Q , i.e., a l t ,j t = s . Noticethat the sub-IVAs on the right-hand side of (14) have been computed by node k during the map phase, because bythe PDA properties, a l t ,j t = a i,k = s and j t (cid:54) = k imply that l t (cid:54) = i and a l t ,k = ∗ . Therefore, U Q ,ji,k can be decodedfrom (14).If s / ∈ A Q k , then s ∈ I Q by (10). There exists thus an index j ∈ Q\{ k } such that s ∈ I j and therefore, by (11),the subset U Q i,k can be recovered from the signal X Q j,s sent by node j . Remark . It is worth pointing out that the storage design {M k } K k =1 only depends on the positions of the “ ∗ ” symbols in A , but not on the parameter Q (See (9)). This indicates that, in practice the map phase can be carriedout even without knowing how many nodes will be participating in the shufﬂe and reduce phases. B. Performance Analysis

We have analyzed the performances of storage and communication loads in the no-stragglers setup in [8]. For thescheme in the preceding subsection, the analysis of storage load follows the same lines as in [8]. When computingthe communication load deﬁned in (3), we have to average over all realizations of the active set Q .

1) Storage Load:

Since the Comp-PDA A contains T “ ∗ ” symbols, and each “ ∗ ” symbol indicates that abatch of η = NF ﬁles is stored at a given node, see (9), the storage load of the proposed scheme is: r = (cid:80) K k =1 |M k | N = T · η N = TF .

2) Communication Load:

We ﬁrst analyze the length of the signals sent for a given realization of the active set Q = Q . For any s ∈ [ S ] , let g s be the occurrence of s in A , and g Q s be the occurrence of s in the columns in Q .By (11) and (13), the length of the signals associated to symbol s is l Q s =  , if g Q s = 0 VNDFQ , if g Q s = 1 g Q s g Q s − · VNDFQ , if g Q s ≥ , (15)when Q is the active set. The total length of all the signals is thus (cid:88) k ∈Q | X Q k | = (cid:88) k ∈Q (cid:88) s ∈ [ S ]: g Q s > | X Q k,s | = (cid:88) s ∈ [ S ]: g Q s > (cid:88) k ∈Q | X Q k,s | = (cid:88) s ∈ [ S ] l Q s . (16)We now compute the communication load as deﬁned in (3) where we have to average over all realizations of theactive set Q : L A = E (cid:34) (cid:80) k ∈ Q | X Q k | NDV (cid:35) = · | Ω QK | · (cid:88) Q∈ Ω QK (cid:88) k ∈Q | X Q k | ( a ) = 1 NDV · C QK · (cid:88) Q∈ Ω QK (cid:88) s ∈ [ S ] l Q s ( b ) = · C QK · (cid:88) s ∈ [ S ] (cid:88) Q∈ Ω QK l Q s · (cid:32) Q (cid:88) l =0 ( g Q s = l ) (cid:33) · (cid:32) K (cid:88) g =1 ( g s = g ) (cid:33) = · C QK · K (cid:88) g =1 (cid:88) s ∈ [ S ] Q (cid:88) l =0 (cid:88) Q∈ Ω QK l Q s · ( g Q s = l ) · ( g s = g ) ( c ) = · C QK · K (cid:88) g =1 (cid:88) s ∈ [ S ]  (cid:88) Q∈ Ω QK NDVFQ · ( g Q s = 1) + Q (cid:88) l =2 (cid:88) Q∈ Ω QK l NDV ( l − FQ · ( g Q s = l )  · ( g s = g )= · C QK · K (cid:88) g =1 (cid:88) s ∈ [ S ]  (cid:88) Q∈ Ω QK ( g Q s = 1) + Q (cid:88) l =2 ll − ·  (cid:88) Q∈ Ω QK ( g Q s = l )  · ( g s = g ) ( d ) = · C Q − K − · K (cid:88) g =1 (cid:88) s ∈ [ S ] (cid:32) C g · C Q − K − g + Q (cid:88) l =2 ll − · C lg · C Q − l K − g (cid:33) · ( g s = g )= 1 FK · C Q − K − · K (cid:88) g =1 (cid:32) C g · C Q − K − g + Q (cid:88) l =2 ll − · C lg · C Q − l K − g (cid:33) · (cid:88) s ∈ [ S ] ( g s = g ) ( e ) = 1 FK · C Q − K − · K (cid:88) g =1 S g (cid:32) C g · C Q − K − g + Q (cid:88) l =2 ll − · C lg · C Q − l K − g (cid:33) ( f ) = 1 FK · C Q − K − · K (cid:88) g =1 S g (cid:32) g · C Q − K − g + Q (cid:88) l =2 gl − · C l − g − · C Q − l K − g (cid:33) = 1 FK · C Q − K − · K (cid:88) g =1 S g g (cid:32) C Q − K − g + Q − (cid:88) l =1 l · C lg − · C Q − l − K − g (cid:33) (17) ( g ) = 1 FK · C Q − K − · K (cid:88) g =1 S g g  C Q − K − g + min { g, Q }− (cid:88) l =max { ,g − K + Q − } l · C lg − · C Q − l − K − g  = FK − TFK · C Q − K − · K (cid:88) g =1 S g g FK − T ·  C Q − K − g + min { g, Q }− (cid:88) l =max { ,g − K + Q − } l · C lg − · C Q − l − K − g  ( h ) = (cid:18) − TFK (cid:19) · C Q − K − · K (cid:88) g =1 θ g  C Q − K − g + min { g, Q }− (cid:88) l =max { ,g − K + Q − } l · C lg − · C Q − l − K − g  , where ( a ) holds by (16); ( b ) holds since for each s ∈ [ S ] , (cid:80) Q l =0 ( g Q s = l ) = 1 and (cid:80) K g =1 ( g s = g ) = 1 ; ( c ) follows from (15); and ( d ) holds since for each symbol occurring g times, it has occurrence l in exact C lg · C Q − l K − g subsets of K with cardinality Q ; in ( e ) , we deﬁned S g to be the number of ordinary symbols occurring g times foreach g ∈ [ K ] ; in ( f ) , we used the equality C lg = gl · C l − g − ; in ( g ) , we eliminated the indices of zero terms in thesummation of (17); and ( h ) follows from the deﬁnition of symbol frequencies. C. File Complexity of the Proposed Schemes

The analysis of ﬁle complexity is similar to the no-straggler setup in [8]. The ﬁles are partitioned into F batchesso that each batch contains η = NF > ﬁles. It is assumed that η is a positive integer. The smallest number of ﬁles N where this assumption can be met is F . Therefore, the ﬁle complexity of the scheme is F .VI. T HE F UNDAMENTAL S TORAGE -C OMMUNICATION T RADEOFF (P ROOF OF T HEOREM (cid:0) r, L ∗ K , Q ( r ) (cid:1) , r ∈ [ K − Q + 1 : K ] can be achieved by the MAN-PDA P r . For ageneral non-integer r ∈ [ K − Q + 1 , K ] , the lower convex envelope of these points can be achieved by memory-and time- sharing. It remains to prove the converse in Theorem 2.Let Z QK ( x ) be a piecewise linear function connecting the points (cid:0) u, Z QK ( u ) (cid:1) sequentially over the interval [ K − Q + 1 , K ] with Z QK ( u ) (cid:44) min { u, Q } (cid:88) l = u + Q − K Q − l Q l C lu C Q − l K − u , u ∈ [ K − Q + 1 : K ] . (18)We shall need the following lemma, proved in Appendix A. Lemma 1.

The sequence Z QK ( u ) is strictly convex and decreasing for u ∈ [ K − Q + 1 : K ] . And the function Z QK ( x ) is convex and decreasing over [ K − Q + 1 , K ] . Let M (cid:44) {M k } K k =1 be a storage design and ( r, L ) be a SC pair achieved based on {M k } K k =1 . For each u ∈ [ K − Q + 1 : K ] , deﬁne a M ,u (cid:44) (cid:88) I⊆K : |I| = u (cid:12)(cid:12)(cid:12)(cid:12)(cid:18) ∩ k ∈I M k (cid:19) (cid:31) (cid:18) ∪ ¯ k ∈K\I M ¯ k (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) , (19)i.e., a M ,u is the number of ﬁles stored u times across all the nodes. Then by deﬁnition, a M ,u satisﬁes a M ,u ≥ , K (cid:88) u = K − Q +1 a M ,u = N , K (cid:88) u = K − Q +1 ua M ,u = r N . (20)For any Q ∈ Ω QK and any l ∈ [ Q ] , deﬁne b QM ,l (cid:44) (cid:88) I⊆Q : |I| = l (cid:12)(cid:12)(cid:12)(cid:12)(cid:18) ∩ k ∈I M k (cid:19) (cid:31) (cid:18) ∪ ¯ k ∈Q\I M ¯ k (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) , i.e., b QM ,l is the number of ﬁles stored exactly l times in the nodes of set Q . Since any ﬁle that is stored u timesacross all the nodes has l occurrences in exactly C lu · C Q − l K − u subsets Q of size Q , i.e., (cid:88) Q∈ Ω QK ( w n is stored at exactly l nodes of Q ) = K − Q + l (cid:88) u =max { l, K − Q +1 } ( w n is stored at exactly u nodes of K ) · C lu · C Q − l K − u , ∀ n ∈ [ N ] . Summing over n ∈ [ N ] , we obtain (cid:88) Q∈ Ω QK b QM ,l = K − Q + l (cid:88) u =max { l, K − Q +1 } a M ,u · C lu · C Q − l K − u . (21)We can now apply the result in [5, Lemma 1] of the system without stragglers, to lower bound the communicationload for any realization of the active set Q = Q : (cid:80) k ∈Q | X Q k | NDV ≥ Q (cid:88) l =1 b QM ,l N Q − l Q l . The average communication load over the random realization of the active set Q is then obtained as: L = E Q (cid:34) (cid:80) k ∈ Q | X Q k | NDV (cid:35) = (cid:88) Q∈ Ω QK (cid:80) k ∈Q | X Q k | NDV · Pr { Q = Q}≥ C QK (cid:88) Q∈ Ω QK Q (cid:88) l =1 b QM ,l N Q − l Q l = 1 C QK Q (cid:88) l =1  (cid:88) Q∈ Ω QK b QM ,l N  Q − l Q l ( a ) = 1 C QK Q (cid:88) l =1  K − Q + l (cid:88) u =max { l, K − Q +1 } a M ,u N C lu C Q − l K − u  Q − l Q l (22) ( b ) = 1 C QK K (cid:88) u = K − Q +1 a M ,u N min { u, Q } (cid:88) l = u + Q − K C lu C Q − l K − u Q − l Q l ( c ) = 1 C QK K (cid:88) u = K − Q +1 a M ,u N · Z QK ( u ) ( d ) ≥ C QK · Z QK (cid:32) K (cid:88) u = K − Q +1 ua M ,u N (cid:33) (23) ( e ) = Z QK ( r ) C QK ( f ) ≥ Z QK ( r + (cid:15) ) C QK , where ( a ) follows from (21); ( b ) holds because the inner summation in (22) only includes summation indices u ∈ [ K − Q : K ] and it includes the summation index u ∈ { K − Q + 1 , . . . , K } if, and only if, the outer summationindex l satisﬁes l ≤ u and l ≥ u + Q − K ; ( c ) follows from (18); ( d ) follows from Lemma 1; ( e ) follows from (20); and ( f ) follows from the fact r ≤ r + (cid:15) . Since (cid:15) can be arbitrarily close to zero, we conclude L ≥ Z QK ( r ) C QK . In particular, when r ∈ [ K − Q + 1 : K ] , by (18), L ≥ min { r, Q } (cid:88) l = r + Q − K Q − l Q l C lr C Q − l K − r C QK ( a ) = min { r, Q − } (cid:88) l = r + Q − K Q − l Q l C lr C Q − l K − r C QK = min { r, Q − } (cid:88) l = r + Q − K Q − l Q l · Q ! l !( Q − l )! · ( K − Q )!( r − l )!( K − Q − r + l )! K ! r !( K − r )! = (cid:16) − r K (cid:17) · min { r, Q − } (cid:88) l = r + Q − K l · r ! l !( r − l )! · ( K − r − Q − l − K − r − Q + l )!( K − Q − K − Q )!( b ) = (cid:16) − r K (cid:17) min { r, Q − } (cid:88) l = r + Q − K l C lr C Q − l − K − r − C Q − K − , where ( a ) holds since for l = Q , the term in the summation is zero. This establishes the desired converse proof.VII. O PTIMALITY OF F ILE C OMPLEXITY (P ROOF OF T HEOREM

A. Preliminaries

Lemma 2.

If a coded computing scheme achieves the fundamental SC tradeoff pair (cid:16) r, L ∗ K , Q ( r ) (cid:17) for any integer r ∈ [ K − Q + 1 : K ] , then each ﬁle is stored exactly r times across the nodes.Proof: According to Lemma 1, the sequence (cid:8) Z QK ( u ) (cid:9) K u = K − Q +1 is strictly convex. Thus for the integer r = (cid:80) K u = K − Q +1 ua M ,u N , the equality in (23) holds if, and only if, a M ,r N = 1 ,a M ,u N = 0 , u ∈ [ K − Q + 1 : K ] \{ r } . Therefore, by deﬁnition of a M ,u in (19), this indicates that each ﬁle is stored exactly r times across the system. Lemma 3.

In a g -regular ( K , F , T , S ) PDA, s.t., K ≥ g ≥ , if there are exactly g − ∗ ” s in each row, then F ≥ C g − K .Proof. With Deﬁnition 5 (the deﬁnition of PDAs), the conclusion follows directly from [33, Lemma 2]. (cid:4)

For each u ∈ [ K ] , deﬁne U QK ( u ) (cid:44) C Q − K − u + min { u, Q }− (cid:88) l =max { ,u − K + Q − } C lu − · C Q − l − K − u l . (24) Lemma 4.

When Q ≥ , the subsequence { U QK ( u ) } K u =2 strictly decreases with u ∈ [2 : K ] .Proof: For each u ∈ [2 : K − , by (24), U QK ( u + 1) − U QK ( u )= − C Q − K − u − + min { u, Q − } (cid:88) l =max { ,u − K + Q } C lu · C Q − l − K − u − l − min { u, Q }− (cid:88) l =max { ,u − K + Q − } C lu − · C Q − l − K − u l ( a ) = − C Q − K − u − + min { u, Q − } (cid:88) l =max { ,u − K + Q } ( C lu − + C l − u − ) · C Q − l − K − u − l − min { u, Q }− (cid:88) l =max { ,u − K + Q − } C lu − · ( C Q − l − K − u − + C Q − l − K − u − ) l (25) ( b ) = − C Q − K − u − + min { u, Q }− (cid:88) l =max { ,u − K + Q } C lu − · C Q − l − K − u − l + min { u, Q − } (cid:88) l =max { ,u − K + Q } C l − u − · C Q − l − K − u − l − min { u, Q }− (cid:88) l =max { ,u − K + Q } C lu − · C Q − l − K − u − l − min { u − , Q − } (cid:88) l =max { ,u − K + Q − } C lu − · C Q − l − K − u − l = − C Q − K − u − + min { u, Q − } (cid:88) l =max { ,u − K + Q } C l − u − · C Q − l − K − u − l − min { u − , Q − } (cid:88) l =max { ,u − K + Q − } C lu − · C Q − l − K − u − l ( c ) = − C Q − K − u − + min { u, Q − } (cid:88) l =max { ,u − K + Q } C l − u − · C Q − l − K − u − l − min { u, Q − } (cid:88) l =max { ,u − K + Q } C l − u − · C Q − l − K − u − l − − min { u, Q − } (cid:88) l =max { ,u − K + Q } C l − u − · C Q − l − K − u − l ( l − (26) ≤ , where in ( a ) , we used (33); in ( b ) , we separate the two summations in (25) and eliminate the indices of zero termsin the separated summations; and in ( c ) , we used the variable change l (cid:48) = l + 1 . Moreover, if u ≥ and Q ≥ ,from (26), U QK ( u + 1) − U QK ( u ) ≤ − C lu − C Q − K − u − < , i.e., U QK ( u ) is strictly decreasing when u ≥ . B. Proof of Theorem 3

Deﬁne the set E (cid:44) { ( Q , r ) : Q ∈ [ K ] , r ∈ [ K − Q + 1 : K ] } , and partition it into three subsets E (cid:44) { ( Q , r ) : Q ∈ [ K ] , r = K } , E (cid:44) { ( Q , r ) : Q ∈ { , K } , r = K − Q + 1 } , E (cid:44) { ( Q , r ) : Q ∈ [3 : K ] , r ∈ [max { K − Q + 1 , } : K − } . Notice that, if ( Q , r ) ∈ E , the bound F ≥ C KK = 1 is trivial. The case ( Q , r ) ∈ E is excluded. Therefore, in therest of the proof, we assume ( Q , r ) ∈ E , i.e., Q ∈ [3 : K ] and r ∈ [max { K − Q + 1 , } : K − .Let A be a ( K , F , T , S ) Comp-PDA that achieves the optimal tradeoff point ( r, L ∗ K , Q ( r )) . Recall that each rowin a Comp-PDA is associated to a ﬁle batch, and a “ ∗ (cid:48)(cid:48) symbole in that row and column k indicates that the ﬁlebatch is stored at node k . According to Lemma 2, each ﬁle is stored exactly r times across the nodes, i.e., thereare exactly r “ ∗ (cid:48)(cid:48) symbols in each row of A .Let θ g (cid:48) be the fraction of ordinary entries occurring g (cid:48) times in the Comp-PDA A , for all g (cid:48) ∈ [ K ] . Since thereare r “ ∗ (cid:48)(cid:48) symbols in each row, from the PDA properties a) and b) in Deﬁnition 5, any ordinary symbol cannotappear more than r + 1 times, i.e., θ g (cid:48) = 0 , ∀ g (cid:48) ∈ [ r + 2 : K ] . (27)Therefore, r + (cid:88) g (cid:48) =1 θ g (cid:48) = 1 . (28)From (6), (24), and (27), the communication load of A has the form L A = (cid:16) − r K (cid:17) · C Q − K − · r +1 (cid:88) g (cid:48) =1 θ g (cid:48) · U QK ( g (cid:48) ) ( a ) ≥ (cid:16) − r K (cid:17) · C Q − K − ·  r +1 (cid:88) g (cid:48) =1 θ g (cid:48)  · U QK ( r + 1) (29) ( b ) = (cid:16) − r K (cid:17) · C Q − K − · U QK ( r + 1) ( c ) = L ∗ K , Q ( r ) , where ( a ) follows since by Lemma 4 the sequence { U QK ( u ) } K u =2 is decreasing and because U QK (1) = U QK (2) = C Q − K − ; ( b ) follows from (28); and ( c ) follows from Theorem 2 and (24). By our assumption L A = L ∗ K , Q ( r ) , the equalityin (29) must hold. Since r + 1 ≥ and the sequence { U QK ( u ) } K u =2 strictly decreases, equality in (29) implies that θ g (cid:48) = 0 , ∀ g (cid:48) ∈ [ r ] . (30)Combining (27) and (30), we conclude that A is a ( r + 1) -regular PDA, and each row has exactly r “ ∗ (cid:48)(cid:48) symbols.Applying Lemma 3, we complete the proof. VIII. C ONCLUSION

In this work, we have explained how to convert any Comp-PDA with at least K − Q + 1 “ ∗ (cid:48)(cid:48) symbols in each rowinto a coded computing scheme for a MapReduce system with Q non-straggling nodes. We have further characterizedthe optimal storage-communication (SC) tradeoff for this system. The Comp-PDA framework allows us to designuniversal coded computing schemes with small ﬁle complexities compared to the ones (the MAN-PDAs) achievingthe fundamental SC tradeoff. In our setup, for a given integer storage load r , the size of active set Q has to be no less than K − r + 1 , sincewe exclude outage events (See Footnote 1). With a given Comp-PDA, the key to obtaining a coded computingscheme for a given active set is that the subarray formed by the columns corresponding to the active set is still aComp-PDA. In fact, for the constructions in P1) and P2), it allows to construct coded computing schemes for some(but not all) active sets if the active set size Q satisﬁes (cid:100) K r (cid:101) ≤ Q ≤ K − r .A PPENDIX AP ROOF OF L EMMA (cid:8) Z QK ( u ) (cid:9) K u = K − Q +1 is strictly convex anddecreasing, i.e., Z QK ( u + 1) − Z QK ( u ) < , ∀ u ∈ [ K − Q + 1 : K − ,Z QK ( u + 1) − Z QK ( u ) > Z QK ( u ) − Z QK ( u − , ∀ u ∈ [ K − Q + 2 : K − . The second statement of the lemma on the piecewise linear function is an immediate consequence of the ﬁrst one.By (18), Z QK ( u ) = min { u, Q } (cid:88) l = u + Q − K Q − l Q l · C lu · C Q − l K − u = min { u, Q } (cid:88) l = u + Q − K C lu · C Q − l K − u l − min { u, Q } (cid:88) l = u + Q − K C lu · C Q − l K − u Q ( a ) = min { u, Q } (cid:88) l = u + Q − K C lu · C Q − l K − u l − C QK Q , where in ( a ) , we used the identity (cid:80) min { u, Q } l = s + Q − K C lu · C Q − l K − u = C QK . Then, Z QK ( u + 1) − Z QK ( u )= min { u +1 , Q } (cid:88) l = u +1+ Q − K C lu +1 C Q − l K − u − l − min { u, Q } (cid:88) l = u + Q − K C lu C Q − l K − u l ( a ) = min { u +1 , Q } (cid:88) l = u +1+ Q − K (cid:0) C lu + C l − u (cid:1) · C Q − l K − u − l − min { u, Q } (cid:88) l = u + Q − K C lu · (cid:16) C Q − l K − u − + C Q − l − K − u − (cid:17) l (31) ( b ) = min { u, Q } (cid:88) l = Q + u +1 − K C lu · C Q − l K − u − l + min { u +1 , Q } (cid:88) l = Q + u +1 − K C l − u · C Q − l K − u − l − min { u, Q } (cid:88) l = Q + u +1 − K C lu · C Q − l K − u − l − min { u, Q − } (cid:88) l = Q + u − K C lu · C Q − l − K − u − r = min { u +1 , Q } (cid:88) l = u +1+ Q − K C l − u C Q − l K − u − l − min { u, Q − } (cid:88) l = u + Q − K C lu C Q − l − K − u − l ( c ) = min { u, Q − } (cid:88) l = u + Q − K C lu C Q − l − K − u − l + 1 − min { u, Q − } (cid:88) l = u + Q − K C lu C Q − l − K − u − l = − min { u, Q − } (cid:88) l = u + Q − K C lu C Q − l − K − u − l ( l + 1) (32) < , where in ( a ) , we applied the identity C m +1 n +1 = C m +1 n + C mn ; (33)in ( b ) , we separated the two summations of (31) into four summations and eliminated indices of zero terms in theseparated summations; and in ( c ) , we used the change of variable l (cid:48) = l − in the ﬁrst summation. Finally, from(32), for u ∈ [ K − Q + 2 : K − , we have (cid:16) Z QK ( u + 1) − Z QK ( u ) (cid:17) − (cid:16) Z QK ( u ) − Z QK ( u − (cid:17) = min { u, Q }− (cid:88) l = Q + u − − K C lu − C Q − l − K − u l ( l + 1) − min { u, Q − } (cid:88) l = Q + u − K C lu C Q − l − K − u − l ( l + 1) ( a ) = min { u, Q }− (cid:88) l = Q + u − − K C lu − · (cid:16) C Q − l − K − u − + C Q − l − K − u − (cid:17) l ( l + 1) − min { u, Q − } (cid:88) l = Q + u − K (cid:0) C lu − + C l − u − (cid:1) · C Q − l − K − u − l ( l + 1) (34) ( b ) = min { u − , Q − } (cid:88) l = Q + u − K C lu − · C Q − l − K − u − l ( l + 1) + min { u − , Q − } (cid:88) l = Q + u − − K C lu − C Q − l − K − u − l ( l + 1) − min { u − , Q − } (cid:88) l = Q + u − K C lu − C Q − l − K − u − l ( l + 1) − min { u, Q − } (cid:88) l = Q + u − K C l − u − C Q − l − K − u − l ( l + 1)= min { u − , Q − } (cid:88) l = Q + u − − K C lu − C Q − l − K − u − l ( l + 1) − min { u, Q − } (cid:88) l = Q + u − K C l − u − C Q − l − K − u − l ( l + 1) ( c ) = min { u, Q − } (cid:88) l = Q + u − K C l − u − C Q − l − K − u − ( l − l − min { u, Q − } (cid:88) l = Q + u − K C l − u − C Q − l − K − u − l ( l + 1)= min { u, Q − } (cid:88) l = u + Q − K C l − u − C Q − l − K − u − ( l − l ( l + 1) > , where in ( a ) we applied the identity (33); in ( b ) , we separated the two summations in (34) and eliminated theindices of zero terms in the separated summations; and in ( c ) , we used the change of variable l (cid:48) = l + 1 .A PPENDIX BP ROOF OF P ROPOSITION L K , Q ( r ) = (cid:16) − r K (cid:17) · C Q − K − · U QK ( r ) ,L ∗ K , Q ( r ) = (cid:16) − r K (cid:17) · C Q − K − · U QK ( r + 1) . Combining these equalities with (26), we obtain L K , Q ( r ) − L ∗ K , Q ( r ) = − (cid:16) − r K (cid:17) · C Q − K − · (cid:16) U QK ( r + 1) − U QK ( r ) (cid:17) = (cid:16) − r K (cid:17) · C Q − K − · min { r, Q − } (cid:88) l =max { ,r + Q − K } C l − r − · C Q − l − K − r − l ( l − ( a ) = (cid:16) − r K (cid:17) · C Q − K − · r · min { r, Q − } (cid:88) l =max { ,r + Q − K } C lr · C Q − l − K − r − l − , (35)where in ( a ) , we used the identity C l − r − = lr · C lr . Therefore, with (7) and (35), L K , Q ( r ) − L ∗ K , Q ( r ) L ∗ K , Q ( r ) = 1 r · (cid:80) min { r, Q − } l =max { ,r + Q − K } l − · C lr · C Q − l − K − r − (cid:80) min { r, Q − } l = r + Q − K l · C lr · C Q − l − K − r − ≤ r · (cid:80) min { r, Q − } l =max { ,r + Q − K } l − · C lr · C Q − l − K − r − (cid:80) min { r, Q − } l =max { ,r + Q − K } l · C lr · C Q − l − K − r − = 1 r · (cid:80) min { r, Q − } l =max { ,r + Q − K } ll − · l · C lr · C Q − l − K − r − (cid:80) min { r, Q − } l =max { ,r + Q − K } l · C lr · C Q − l − K − r − a ) ≤ r , where in ( a ) , we used the fact ll − ≤ for any l ≥ .To prove the second part, we ﬁrst note that, by Corollary 3, the number of batches required by constructions P1)and P2) is F = 1 c · q K q . (36)On the other hand, to achieve the fundamental SC tradeoff, the number of required batches is F ∗ = C r K = K ! r !( K − r )! ( a ) ≥ √ π K K + e − K e √ πr r + e − r · e √ π ( K − r ) K − r + e − ( K − r ) = 1 e · (cid:115) K πr ( K − r ) · (cid:18) K r (cid:19) r (cid:18) KK − r (cid:19) K − r = 1 e · q (cid:112) π ( q − K · q K q · (cid:18) qq − (cid:19) K (1 − q ) , (37)where ( a ) follows by applying Stirling’s approximation √ πn n + e − n ≤ n ! ≤ e √ πn n + e − n to both thenumerator and the denominator. Taking the ratio FF ∗ using (36) and (37), we complete the proof of the secondpart. R EFERENCES[1] Q. Yan, M. Wigger, S. Yang, and X. Tang, “A fundamental storage-communication tradeoff in distributed computing with straggling nodes,”in

Proc. IEEE Int. Symp. Inf. Theory,

Paris, France, pp. 2803–2807, Jul. 2019.[2] J. Dean and S. Ghemawat, “MapReduce: Simpliﬁed data processing on large clusters,”

Sixth USENIX OSDI , Dec. 2004.[3] Y. Liu, J. Yang, Y. Huang, L. Xu, S. Li, and M. Qi, “MapReduce based parallel neural networks in enabling large scale machine learning,”

Comput. Intell. Neurosci. , 2015.[4] C. T. Chu, S. K. Kim, Y. A. Lin, Y. Y. Yu, G. Bradski, A. Y. Ng, and K. Olukotun,“Map-reduce for machine learning on multicore.” In

Proc. 20th Ann. Conf. Neural Information Processing Systems (NIPS),

Vancouver, British Columbia, Canada, pp. 281–288, Dec. 2006.[5] S. Li, M. A. Maddah-Ali, Q. Yu, and A. S. Avestimehr, “A fundamental tradeoff between computation and communication in distributedcomputing,”

IEEE Trans. Inf. Theory, vol. 64, no. 1, pp. 109–128, Jan. 2018.[6] Y. H. Ezzeldin, M. Karmoose, and C. Fragouli, “Communication vs distributed computation: An alternative trade-off curve,” in

Proc. IEEEInf. Theory Workshop (ITW) , Kaohsiung, Taiwan, Nov. 2017.[7] Q. Yan, S. Yang, and M. Wigger, “A storage-computation-communication tradeoff for distributed computing,” in

Proc. IEEE Int. Symp.Wire. Commun. System (ISWCS),

Lisbon, Portugal, Aug. 2018.[8] Q. Yan, S. Yang, and M. Wigger, “Storage-computation-communication tradeoff in distributed computing: Fundamental tradeoff andcomplexity,” arXiv:1806.07565.[9] Q. Yu, S. Li, M. A. Maddah-Ali, and A. S. Avestimehr, “How to optimally allocate resources for coded distributed computing,” in

Proc.IEEE Int. Conf. Commun. (ICC), 2017,

Paris, France, 21–25, May. 2017.[10] S. Li, Q. Yu, M. A. Maddah-Ali, A. S. Avestimehr,“A scalable framework for wireless distributed computing,”

IEEE/ACM Trans. Netw., vol.25, no. 5, pp. 2643–2653, Oct. 2017.[11] S. Li, Q. Yu, M. A. Maddah-Ali, and A. S. Avestimehr, “Edge-facilitated wireless distributed computing,” in

Proc. IEEE Glob. Commun.Conf. (Globlcom),

Washington, DC, USA, Dec. 2016.[12] F. Li, J. Chen, and Z. Wang, “Wireless MapReduce distributed computing,” in

Proc. IEEE Int. Symp. Inf. Theory,

Vail, CO, USA, pp.1286–1290, Jun. 2018.[13] E. Parrinello, E. Lampiris, and P. Elia, “Coded distributed computing with node cooperation substantially increases speedup factors,” in

Proc. IEEE Int. Symp. Inf. Theory,

Vail, CO, USA, pp. 1291–1295, Jun. 2018.[14] S. R. Srinivasavaradhan, L. Song, and C. Fragouli, “Distributed computing trade-offs with random connectivity,” in

Proc. IEEE Int. Symp.Inf. Theory,

Vail, CO, USA, pp. 1281–1285, Jun. 2018.[15] K. Lee, M. Lam, R. Pedarsani, D. Papailiopoulos, and K. Ramchandran, “Speeding up distributed machine learning using codes,”

IEEETrans. Inf. Theory. vol. 64, no. 3, pp. 1514–1529, Mar. 2018.[16] Q. Yu, M. Maddah-Ali, and S. Avestimehr, “Polynomial codes: an optimal design for high-dimensional coded matrix multiplication,” in

Proc. The 31st Annual Conf. Neural Inf. Processing System (NIPS) , Long Beach, CA, USA, May 2017.[17] Q. Yu, M. Maddah-Ali, and S. Avestimehr, “Straggler mitigation in distributed matrix multiplication: Fundamental limits and optimalcoding,” in

Proc. IEEE Int. Symp. Inf. Theory,

Vail, CO, USA, pp. 2022–2026, Jun. 2018.[18] K. Lee, C. Suh, and K. Ramchandran, “High-dimensional coded matrix multiplication,” in

Proc. IEEE Int. Symp. Inf. Theory,

Aachen,Germany, pp. 2418–2422, Jun. 2017.[19] H. Park, K. Lee, J. Sohn, C. Suh, and J. Moon, “Hierarchical coding for distributed computing,” in

Proc. IEEE Int. Symp. Inf. Theory,

Vail, CO, USA, pp. 1630–1634, Jun. 2018.[20] F. Haddadpour, and V. R. Cadambe, “Codes for distributed ﬁnite alphabet matrix-vector multiplication,” in

Proc. IEEE Int. Symp. Inf.Theory,

Vail, CO, USA, pp. 1625–1629, Jun. 2018.[21] S. Kiani, N. Ferdinand and S. C. Draper, “Exploitation of stragglers in coded computation,” in

Proc. IEEE Int. Symp. Inf. Theory,

Vail,CO, USA, pp. 1988–1992, Jun. 2018.[22] T. Baharav, K. Lee, O. Ocal, and K. Ramchandran, “Straggler-prooﬁng massive-scale distributed matrix multiplication with d -dimensionalproduct codes,” in Proc. IEEE Int. Symp. Inf. Theory,

Vail, CO, USA, pp. 1993–1997, Jun. 2018.[23] N. Ferdinand and S. C. Draper, “Hierachical coded computation,” in

Proc. IEEE Int. Symp. Inf. Theory,

Vail, CO, USA, pp. 1620–1624,Jun. 2018.[24] A. Reisizadeh, S. Prakash, R. Pedarsani, and S. Avestimehr, “Coded computation over heterogeneous clusters,” in

Proc. IEEE Int. Symp.Inf. Theory,

Aachen, Germany, pp. 2408–2412, Jun. 2017. [25] R. Bitar, P. Parag, and S. E. Rouayheb, “Minimizing latency for secure coded computing using secret sharing via staircase codes”, IEEETrans. Commun. , early access, 2020.[26] R. Tandon, Q. Lei, A. G. Dimakis, and N. Karampatziakis, “Gradient coding: avoiding stragglers in synchronous gradient desent,” in

Proc.34th Int. Conf. Machine Learning (ICML),

Sydney, Australia, Aug. 2017.[27] N. Raviv, I. Tamo, R. Tandon, and A. G. Dimakis, “Gradient coding from cyclic MDS codes and expander graphs,” in

Proc. 35th Int.Conf. Machine Learning, (ICML),

Stockholm, Sweden, Jul. 2018.[28] Z. Charles, and D. Papailiopoulos,“Gradient coding using the stochastic block model,” in

Proc. IEEE Int. Symp. Inf. Theory,

Vail, CO,USA, pp. 1998–2002, Jun. 2018.[29] W. Halbawi, N. Azizan, F. Salehi, and B. Hassibi, “Improving distributed gradient descent using Reed-Solomon codes,” in

Proc. IEEE Int.Symp. Inf. Theory,

Vail, CO, USA, pp. 2027–2031, Jun. 2018.[30] E. Ozfatura, D. G ¨ und ¨ uz and S. Ulukus, “Speeding up distributed gradient descent by utilizing non-persistent stragglers,” in Proc. IEEEInt. Symp. Inf. Theory,

Paris, France, pp. 2729–2733, Jul. 2019.[31] S. Li, M. A. Maddah-Ali, and A. S. Avestimehr, “A uniﬁed coding framework for distributed computing with straggling servers,” in

Proc.IEEE Globecom Workshop,

Washington, DC, USA, pp. 1–6, 2016.[32] J. Zhang and O. Simeone, “Improved latency-communication trade-off for map-shufﬂe-reduce systems with stragglers,” in

Proc. IEEE Int.Conf. Acoust., Speech & Signal Processing (ICASSP) , Brighton, UK, May, 2019.[33] Q. Yan, M. Cheng, X. Tang, and Q. Chen, “On the placement delivery array design for centralized coded caching scheme,”

IEEE Trans.Inf. Theory, vol. 63, no. 9, pp. 5821–5833, Sep. 2017.[34] M. A. Maddah-Ali and U. Niesen, “Fundamental limits of caching,”

IEEE Trans. Inf. Theory, vol. 60, no. 5, pp. 2856–2867, May 2014.[35] J. Wang, M. Cheng, Q. Yan, and X. Tang, “Placement delivery array design for coded caching scheme in D2D networks,”

IEEE Trans.Commun, vol. 67, no. 5, May 2019.[36] Q. Yan, M. Wigger, and S. Yang, “Placement delivery array design for combination networks with edge caching,” in

Proc. IEEE Int. Symp.Inf. Theory,

Vail, CO, USA, pp. 1555–1559, Jun. 2018.[37] A. VR, P. Sarvepalli, and A. Thangaraj, “Subpacketization in coded caching with demand privacy,” in

Proc. 26th National Conf. Commun.(NNC),

Kharagpur, India, Feb. 2020.[38] R. Sun, H. Zheng, J. Liu, X. Du, and M. Guizani, “Placement delivery array design for the coded caching scheme in medical data sharing,”

Neural Comput. & Applic. 32 , pp. 867–878, 2020.[39] M Cheng, J Jiang, Q Yan, X Tang, “Constructions of coded caching schemes with ﬂexible memory sizes,”

IEEE Trans. Commun. , vol.67, no. 6, Jun. 2019.[40] M. Cheng, J. Jiang, X. Tang, and Q. Yan, “Some variant of known coded caching schemes with good performance,”

IEEE Trans. Commun. ,early access, 2019.[41] M. Cheng, J. Jiang, Q. Wang, and Y. Yao, “A generalized grouping scheme in coded caching.”

IEEE Trans. Commun. vol. 67, no. 5, pp.3422–3430, May. 2019.[42] C. Shangguan, Y. Zhang, and G. Ge, “Centralized coded caching schemes: A hypergraph theoretical approach,”

IEEE Trans. Inf. Theory, vol. 64, no. 8, pp. 5755–5766, Aug. 2018.[43] Q. Yan, X. Tang, Q. Chen, and M. Cheng, “Placement delivery array design through strong edge coloring of bipartite graphs,”

IEEECommun. Lett. , vol. 22, no. 2, pp. 236–239, Feb. 2018.[44] Q. Yan, X. H. Tang, and Q. Chen, “Placement delivery array and its appllications,” in

Proc. IEEE Inf. Theory Workshop (ITW) , Guangzhou,China, Nov. 2018.[45] L. Tang and A. Ramamoorthy, “Coded caching schemes with reduced subpacketization from linear block codes,”

IEEE Trans. Inf. Theory, vol. 64, no. 4, pp. 3099–3120, Apr. 2018.[46] K. Shanmugam, A. G. Dimakis, J. Llorca and A. M. Tulino, “A uniﬁed Ruzsa-Szemer ´ edi framework for ﬁnite-length coded caching,” Inproc.51st Asilomar Conference on Signals, Systems, and Computers,