Cascaded Coded Distributed Computing Schemes Based on Placement Delivery Arrays
11 Cascaded Coded Distributed Computing SchemesBased on Placement Delivery Arrays
Jing Jiang and Lingxiao Qu
Abstract Li et al . introduced coded distributed computing (CDC) scheme to reduce the communication load in general distributedcomputing frameworks such as MapReduce. They also proposed cascaded CDC schemes, where each output function is computedmultiple times. Such schemes achieved the fundamental trade-off between computation load and communication load on homo-geneous computing networks, where all the nodes have the same storage, computing and communication capabilities. However,these schemes require exponentially large numbers of input files and output functions when the number of computing nodes getslarge. In this paper, we give a construction of cascaded CDC schemes based on placement delivery arrays (PDAs), which wasintroduced to study coded caching schemes. Consequently, based on known results of PDAs, several infinite classes of cascadedCDC schemes could be obtained. We show that, in many cases, LL Li ≤ . , where L and L Li are the communication loads ofour new scheme and the scheme derived by Li et al . Most importantly, the number of output functions in all the new schemesare only a factor of the number of computing nodes, and the number of input files in our new schemes is much smaller than thatof input files in CDC schemes derived by Li et al . Index Terms
Coded distributed computing, MapReduce, placement delivery array
I. I
NTRODUCTION W Ith the amount of data being generated increasing rapidly, large scale distributed computing is becoming very relevant.The MapReduce [4], Hadoop [1] are of the popular distributed computing frameworks, and have wide applicationin many areas, for instance, [5], [7], [8], [10], [11], [13], [14], [20]. In such frameworks, in order to compute the outputfunctions, the computation consists of three phases: map phase, shuffle phase and reduce phase. In the map phase, distributedcomputing nodes process parts of the input data files locally, generating some intermediate values according to their designedmap functions. In the shuffle phase, the nodes exchange the calculated intermediate values among each other, in order to obtainenough values to calculate the final output results by using their designed reduce functions. In the reduce phase, each nodecomputes its designed reduce functions by using the intermediate values derived in the map phase and the shuffle phase.Coded distributed computing (CDC) introduced in [9] is an efficient approach to reduce the communication load in thedistributed computing framework. The authors in [9] characterized a fundamental tradeoff between “computation load” in themap phase and “communication load” in the shuffle phase on homogeneous computing networks, i.e., all the nodes have thesame storage, computing and communication capabilities. All the schemes mentioned in this paper are based on homogeneousnetworks, unless otherwise specified. Furthermore, [9] proposed a CDC scheme with N = (cid:0) Kr (cid:1) input files and Q = (cid:0) Ks (cid:1) outputfunctions, where K is the total number of computing nodes, r is the average number of nodes that map each file, and s is thenumber of nodes that compute each reduce function.Obviously, the number of input files N = (cid:0) Kr (cid:1) and the number of output functions Q = (cid:0) Ks (cid:1) increase too quickly with K tobe used in practice when K is large. There are a few works paying attention to reducing the values of N and Q , for instance,[6], [16], [18]. However, the number s of nodes that compute each reduce function in most of the known schemes is a certainvalue, for example, s = 1 in the schemes in [6] and [18], s = 1 or s = t where t | K , i.e., t is a factor of K , in the schemesin [16]. We list these known CDC schemes in Table I. J. Jiang and L. Qu are with Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, Guilin 541004, China (e-mail:[email protected], [email protected]). a r X i v : . [ c s . I T ] A ug TABLE I: Some known CDC schemes
Schemes and Parameters Number ofNodes K ComputationLoad r ReplicationFactor s Number ofFiles N Number of ReduceFunctions Q CommunicationLoad L [9], K, r, s ∈ N + with ≤ r, s ≤ K K r s (cid:0) Kr (cid:1) (cid:0) Ks (cid:1) min { r + s,K } (cid:80) l =max { r +1 ,s } l (cid:16) Kl (cid:17)(cid:16) l − r − (cid:17)(cid:16) rl − s (cid:17) r (cid:16) Kr (cid:17)(cid:16) Ks (cid:17) [6], K, t ∈ N + with t ≥ and t | K K t Kr ) r − K r − (1 − rK ) [18], K, t ∈ N + with t ≥ and t | K K K − t Kr − Kr ) r − K r − (1 − rK ) [16], K, t ∈ N + with t ≥ and t | K K t t ( Kr ) r − ( Kr ) r − r r ( K − r ) K r ( r − + r (cid:80) l =2 ( rK ) r + l (cid:0) rl (cid:1)(cid:0) Kr (cid:1) l l l l − TABLE II: New CDC schemes
Schemes and Parameters Number ofNodes K ComputationLoad r ReplicationFactor s Number ofFiles N Number of ReduceFunctions Q CommunicationLoad L Scheme 1,
K, r, s ∈ N + with ≤ r, s ≤ K K r s (cid:0) Kr (cid:1) K gcd ( K,s ) sr (1 − rK ) Scheme 2,
K, t, s ∈ N + with t ≥ , t | K and s ≤ K K t s ( Kr ) r − K gcd ( K,s ) sr − (1 − rK ) Scheme 3,
K, t, s ∈ N + with t ≥ , t | K and s ≤ K K K − t s ( Kr − Kr ) r − K gcd ( K,s ) sr − (1 − rK ) However, in practice, the reduce functions are desired to be computed by multiple nodes, which allows for consecutiveMapReduce procedures as the reduce function outputs can act as the input files for the next MapReduce procedures [21]. Sucha kind of CDC schemes is always called cascaded
CDC scheme.In this paper, we focus on cascaded CDC schemes with smaller numbers of input files and output functions.1) Based on placement delivery array (PDA), which was introduced to construct coded caching schemes in [17], we proposea construction of cascaded CDC schemes. That is, given a known PDA, one can obtain a class of cascaded CDC schemes.We list three classes of new schemes in Table II.2) The number of output functions in all the new schemes are only a factor of the number of computing nodes K , and thenumber of input files in our new schemes is exponentially smaller in K than that of the scheme in [9].3) From the construction in this paper, we can obtain the schemes in [18], i.e., our new schemes include the schemes in [18]as a special case. In addition, our new schemes include some schemes in [6], [9] and [16] as a special case.The rest of this paper is organized as follows. In Section II, we formulate a general distributed computing framework. InSection III, a construction of cascaded CDC schemes is proposed. In Section IV, we analyse the performance of our newschemes from the construction in Section III,. Finally conclusion is drawn in Section V.II. P RELIMINARIES
In this section, we give a formulation of our problem. In this system, there are K distributed computing nodes K = { , , . . . , K − } , N files W = { w , w , . . . , w N − } , where w n ∈ F D of length D for a given positive integer D for any n ∈ { , , . . . , N − } , and Q output functions Q = { φ , φ , . . . , φ Q − } , where φ q : F N D → F B for any q ∈ { , , . . . , Q − } ,which maps all the files to a bit stream u q = φ q ( w , w , . . . , w N − ) ∈ F B of length B for a given positive integer B . Thegoal of node k ( k ∈ K ) is responsible for computing a subset of output functions, denoted by a set Q k ⊆ Q .As illustrated in Fig. 1 from [9], we assume that each output function φ q , q ∈ { , , . . . , Q − } , decomposes as: φ q ( w , w , . . . , w N − ) = h q ( g q, ( w ) , g q, ( w ) , . . . , g q,N − ( w N − )) , where1) The “map” function g q,n : F D → F T , n ∈ { , , . . . , N − } , maps the file w n into the intermediate value (IVA) v q,n (cid:44) g q,n ( w n ) ∈ F T for a given positive integer T .2) The “reduce” function h q : F N T → F B , maps the IVAs in { v q,n | n ∈ { , , . . . , N − }} into the output value u q = h q ( v q, , v q, , . . . , v q,N − ) .Following the above decomposition, the computation proceeds in the following three phases. Fig. 1: Illustration of a two-stage distributed computing framework • Map Phase. For each k ∈ K , node k computes g q,n for q ∈ { , , . . . , Q − } and w n ∈ W k , where W k ⊆ W is thesubset of files stored by k , i.e., node k locally computes a subset of IVAs C k = { v q,n | φ q ∈ Q , w n ∈ W k } . • Shuffle Phase. Denote Q k , k ∈ K , as the subset of output functions which will be computed by node k . In order to obtainthe output value of φ q where φ q ∈ Q k , node k needs to compute h q ( v q, , v q, , . . . , v q,N − ) , i.e., it needs the IVAs thatare not computed locally in the map phase. Hence, in this phase, the K nodes exchange some of their computed IVAs.Suppose that node k creates a message X k = ϕ k ( C k ) of some length l k ∈ N , using a function ϕ k : F |C k | T → F lk . Thenit multicasts this message to all the other nodes, where each node receive it error-free. • Reduce Phase. Using the shuffled messages X , X , . . . , X K − and the IVAs in C k it computed locally in the map phase,node k now could derive ( v q, , v q, , . . . , v q,N − ) = ψ q ( X , X , . . . , X K − , C k ) for some function ψ q : F l × F l × . . . × F lK − × F |C k | T → F N T where φ q ∈ Q k . More specifically, node k could derive the following IVAs { v q,n | φ q ∈ Q k , n ∈ { , , . . . , N − }} , which is enough to compute output value u q = h q ( v q, , v q, , . . . , v q,N − ) .Define the computation load as r = (cid:80) K − k =0 |M k | N and communication load as L = (cid:80) K − k =0 l k QNT , i.e., r is the average number ofnodes that map each file and L is the ratio of the total number of bits transmitted in shuffle phase to QN T . Li et al . [9] gavethe following optimal computation-communication function. L ∗ ( r, s ) = min { r + s,K } (cid:88) l =max { r +1 ,s } (cid:0) K − rK − l (cid:1)(cid:0) rl − s (cid:1)(cid:0) Ks (cid:1) l − rl − , (1)where K is the number of nodes, r is the computation load and s is the number of nodes that compute each reduce function.Moreover, the authors proposed some schemes achieving the above optimal computation-communication function. Lemma 1: ([9]) Suppose that K , r , and s are positive integers. Then there exists a CDC scheme with K nodes, N = (cid:0) Kr (cid:1) files and Q = (cid:0) Ks (cid:1) output functions, such that the communication load is L Li ( r, s ) = min { r + s,K } (cid:88) l =max { r +1 ,s } (cid:0) K − rK − l (cid:1)(cid:0) rl − s (cid:1)(cid:0) Ks (cid:1) l − rl − , where r is the computation load and s is the number of nodes that compute each reduce function.For convenience, the above CDC schemes are called Li-CDC schemes in this paper.III. A NEW C ONSTRUCTION OF
CDC
SCHEMES
In this section, we will give a construction of cascaded CDC schemes by using placement delivery array which was introducedby Yan et al . [17] to study coded caching schemes.
Definition 1: ([17]) Suppose that
K, N, Z and S are positive integers. P = ( p i,k ) , i ∈ { , , . . . , N − } , k ∈ { , , . . . ,K − } , is an N × K array composed of a specific symbol “ ∗ ” and positive integers , , · · · , S − . Then P is a ( K, N, Z, S ) placement delivery array ( PDA for short ) if
1) the symbol “ ∗ ” occurs exactly Z times in each column;2) each integer appears at least once in the array;3) for any two distinct entries p i ,k and p i ,k , p i ,k = p i ,k = u is an integer only ifa. i (cid:54) = i , k (cid:54) = k , i.e., they lie in distinct rows and distinct columns; andb. p i ,k = p i ,k = ∗ , i.e., the corresponding × subarray formed by rows i , i and columns k , k must be of thefollowing form (cid:32) u ∗∗ u (cid:33) or (cid:32) ∗ uu ∗ (cid:33) . A ( K, N, Z, S ) PDA P is g - regular , denoted as g - ( K, N, Z, S ) PDA, if each integer in { , , . . . , S − } occurs exactly g times in P . Example 1:
We can directly check that the following array is a - (6 , , , PDA: P × = ∗ ∗ ∗ ∗ ∗ ∗
32 0 ∗ ∗ ∗ ∗ ∗ ∗ The concept of a PDA is a useful tool to construct coded caching schemes, for instance, [2], [3], [15], [17]. Recently, Yan et al .[18], [19] used PDAs to construct CDC schemes with s = 1 , where s is the number of nodes that compute each reducefunction. Lemma 2: ([18]) Suppose that there exists a g - ( K, N, Z, S ) PDA with g ≥ . Then there exists a CDC scheme satisfyingthe following properties:1) it consists of K distributed computing nodes K = { , , . . . , K − } , N files and Q = K output functions Q = { φ , φ , . . . , φ Q − } ;2) node k , where k ∈ K , is responsible for computing φ k ;3) the computation load is r = KZN and the number of IVAs multicasted by all the nodes is gSg − .In the above scheme, the numbers of nodes and files are corresponding to the numbers of columns and rows of the PDA,respectively. Furthermore, node k , k ∈ { , , . . . , K − } , is responsible for computing φ k , i.e., distinct nodes are responsiblefor computing distinct output functions. So the number Q of output functions and the number of nodes are the same, i.e., Q = K . In order to obtain the main results of this section, the property that some nodes are responsible for computing thesame output function is needed. Example 2:
Consider the - (6 , , , PDA P × in Example 1. By using P × , we will construct a CDC scheme with K = 6 nodes { , , , , , } , N = 4 files { w , w , w , w } , Q = 2 output functions { φ , φ } (note that Q = K = 6 inLemma 2). • Map phase. For each k ∈ { , , , , , } , node k stores the files in W k = { w n | n ∈ { , , , } , p n,k = ∗ } , (2)i.e., W = { w , w } , W = { w , w } , W = { w , w } , W = { w , w } , W = { w , w } , W = { w , w } . • Shuffle phase. For each k ∈ { , , , } , node k is responsible for computing φ , i.e., it need to obtain output value u = h ( v , , v , , v , , v , ) . For each k ∈ { , } , node k is responsible for computing φ . That is, node k is responsiblefor computing φ q k , where ( q , q , q , q , q , q ) = (0 , , , , , . We take node as an example, i.e., in order to compute the output value u , node should obtain all the IVAs in { v ,n | n ∈ { , , , }} . Since node stores w and w , itcan locally compute the IVAs in { v ,n | n ∈ { , }} . (3)Since p , = 0 , node can derive v , from the following subarray P formed by the rows the rows i , i , i and columns k , k , k , where p i ,k = p i ,k = p i ,k = 0 . P = ∗ ∗ ∗ ∗ ∗ ∗ Observe P . From (2), p n,k = ∗ means that the node k can locally compute v q k ,n . So, from P , we know that node , and do not know v q , = v , , v q , = v , and v q , = v , , respectively. Divide v , , v , and v , into disjointsegments, respectively, i.e., v , = ( v , , v , ) , v , = ( v , , v , ) , v , = ( v , , v , ) . For a segments v kq,n , the superscript k represents the node that can locally compute this IVA, which implies the segment v kq,n will be transmitted by node k . That is – node multicasts the message v , (cid:76) v , to nodes and , – node multicasts the message v , (cid:76) v , to nodes and . – node multicasts the message v , (cid:76) v , to nodes and .Since node can compute v , and v , locally, it can derive v , and v , from the messages v , (cid:76) v , multicasted bynode and v , (cid:76) v , multicasted by node , respectively. This implies that node can obtain v , = ( v , , v , ) . Byusing similar method, node can obtain v , . Together with the known IVAs in (3), node can obtain all the IVAs in { v ,n | n ∈ { , , , }} . Similarly, node k , k ∈ { , , } can obtain all the IVAs in { v ,n | n ∈ { , , , }} and node k , k ∈ { , } can obtain all the IVAs in { v ,n | n ∈ { , , , }} . • Reduce phase. By using the IVAs derived in map phase and shuffle phase, for each k ∈ { , , , } and k ∈ { , } ,node k and k can compute output values u and u , respectively. Since each node stores files, the computation loadis r = × N = × = 3 . For each integer u ∈ { , , , } , the number of IVAs multicasted by the nodes from P u is × .So the total number of IVAs multicasted by all the nodes is × × . Lemma 3:
Suppose that there exists a g - ( K, N, Z, S ) PDA with g ≥ . Then there exists a CDC scheme satisfying thefollowing properties:1) it contains K distributed computing nodes K = { , , . . . , K − } , N files and Q output functions Q = { φ , φ , . . . , φ Q − } ,where Q ≤ K ;2) node k is responsible for computing φ q k , where φ q k ∈ Q ;3) the computation load is r = KZN and the number of IVAs multicasted by all the nodes is gSg − .We now pay our attention to the proof of Lemma 3. Given a g - ( K, N, Z, S ) PDA P = ( p n,k ) , n ∈ { , , . . . , N − } , k ∈ { , , . . . , K − } , we can construct a CDC scheme with K nodes K = { , , . . . , K − } , N files W = { w , w , . . . , w N − } and Q output functions Q = { φ , φ , . . . , φ Q − } , where node k ∈ K is responsible for computing φ q k such that φ q k ∈ Q and ∪ k ∈K φ q k = Q . • Map phase. Node k , where k ∈ { , , . . . , K − } , stores the files in W k = { w n | n ∈ { , , . . . , N − } , p n,k = ∗} . (4)Hence the node k can compute the IVAs in the set (cid:91) p n,k = ∗ v q k ,n . (5) We can also obtain the computation load r = (cid:80) K − k =0 |W k | N = (cid:80) K − k =0 ZN = KZN . • Shuffle phase. Note that node k , where k ∈ { , , . . . , K − } , is responsible for computing the output function φ q k . According to the definition of g -regular, each integer u ∈ { , , . . . , S − } occurs g times in P . Suppose that p i ,k = p i ,k = . . . = p i g ,k g = u. From condition 3) of Definition 1, the subarray of P u formed by the rows i , i , . . . , i g and columns k , k , . . . , k g is of the following form: P u = k k · · · k g i u ∗ · · · ∗ i ∗ u · · · ∗ ... ... ... ... i g ∗ ∗ · · · u . (6)Suppose that node k j , j ∈ { , , . . . , g } , is responsible for computing φ q kj . Divide v q kj ,i j into g − segments, i.e., v q kj ,i j = ( v k q kj ,i j , . . . , v k j − q kj ,i j , v k j +1 q kj ,i j , . . . , v k g q kj ,i j ) . (7)The superscript k l , l ∈ { , , . . . , g } , of a segment means that such a segment will be transmitted by k l . For any l ∈ { , , . . . , g } , the node k l multicasts the following message: (cid:77) t ∈{ , ,...,g }\{ l } v k l q kt ,i t . (8)Hence the number of IVAs multicasted by the nodes k , k , . . . , k g is gg − for the integer u . Since there are S integers in { , , . . . , S − } , the total number of IVAs multicasted by all the nodes are gSg − .In order to show the correctness of the scheme in the reduce phase, we now prove that for any j ∈ { , , . . . , g } , thenode k j will obtain the IVAs v q kj ,i j from P u in (6), where p i j ,k j = u .1) Since p i ,k = p i ,k = . . . = p i g ,k g = u , according to the definition of a PDA, for any j ∈ { , , . . . , g } , we have p i t ,k j = ∗ for any t ∈ { , , . . . , g } \ { j } . Then node k j stores file w i t from (4), which implies that it can locallycompute v q kl ,i t for any l ∈ { , , . . . , g } . So k j can compute v q kt ,i t for any t ∈ { , , . . . , g } \ { j } .2) We take k as an example, i.e., node k will obtain the IVA v q k ,i . According to (7), it need segments v k q k ,i , v k q k ,i ,. . . , v k g q k ,i . For the segment v k q k ,i , from (8), node k multicasts the message (cid:76) t ∈{ , , ,...,g } v k q kt ,i t . From 1), k can compute v q kt ,i t for any t ∈ { , , . . . , g } , which implies it can locally compute v k q kt ,i t for any t ∈ { , , . . . , g } .So the node k can obtain the segment v k q k ,i from the message (cid:76) t ∈{ , , ,...,g } v k q kt ,i t multicasted by k . Similarly,the node k can obtain the segment v k l q k ,i for any l ∈ { , , . . . , g } from the message (cid:76) t ∈{ , ,...,g }\{ l } v k l q kt ,i t multicasted by k l . Now the node k recovers the IVA v q k ,i = ( v k q k ,i , v k q k ,i , . . . , v k g q k ,i ) . Similarly, for any j ∈ { , , . . . , g } , node k j could recover v q kj ,i j from P u in (6) . • Reduce phase: Consider node k , where k ∈ { , , . . . , K − } . Since the node k is responsible for computing φ q k , itneeds to know the IVAs in the set { v q k ,n | n ∈ { , , . . . , N − }} = ( (cid:91) p n,k = ∗ v q k ,n ) (cid:91) ( (cid:91) p n,k (cid:54) = ∗ v q k ,n ) . From (5), the node k can locally compute (cid:83) p n,k = ∗ v q k ,n . Hence it only needs to derive (cid:83) p n,k (cid:54) = ∗ v q k ,n . For any p n,k (cid:54) = ∗ ,there exists an integer u ∈ { , , . . . , S − } such that p n,k = u . From the shuffle phase, the node k can get the IVA v q k ,n from P u in (6). That is node k can derive all the IVAs in (cid:83) p n,k (cid:54) = ∗ v q k ,n .Next, we will use Lemma 3 to derive some cascaded CDC schemes where the parameter s of such schemes is a positiveinteger such that s > . Theorem 1:
Suppose that there exists a g - ( K, N, Z, S ) PDA with g ≥ . Then for any positive integer s with s ≤ K ,there exists a cascaded CDC scheme consisting of K distributed computing nodes, N files and Q = K gcd ( K,s ) output functionssuch that the computation load is r = KZN and the communication load is L = gsS ( g − KN . Remark 1:
Applying Theorem 1 with s = 1 , the scheme is the same as the scheme in Lemma 2. That is, the schemes inTheorem 1 include the schemes in [18] as a special case.The rest of the section is devoted to the proof of Theorem 1. Given a g - ( K, N, Z, S ) PDA P = ( p i,k ) , i ∈ { , , . . . , N − } , k ∈ { , , . . . , K − } , we can construct a CDC scheme with K nodes K = { , , . . . , K − } , N files W = { w , w , . . . , w N − } and Q = K gcd ( K,s ) output functions Q = { φ , φ , . . . , φ Q − } . • Map phase. Node k , where k ∈ { , , . . . , K − } , stores the files in W k = { w n | n ∈ { , , . . . , N − } , p n,k = ∗} , (9)Hence the node k can compute the IVAs in the set (cid:83) p n,k = ∗ v q k ,n . We can also obtain the computation load r = (cid:80) K − k =0 |W k | N = (cid:80) K − k =0 ZN = KZN . • Shuffle phase. Node k , where k ∈ { , , . . . , K − } , is responsible for computing a subset of output functions Q k = { φ
Example 3:
The following array is a - (10 , , , PDA. P × = ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ From Theorem 1, for s = 4 , we can construct a CDC scheme with K = 10 nodes K = { , , . . . , } , N = 5 files W = { w , w , w , w , w } , Q = K gcd( K,s ) = 5 functions Q = { φ , φ , φ , φ , φ } . Then, according to (9) and (10), W = { w , w , w } , W = { w , w , w } , W = { w , w , w } , W = { w , w , w } , W = { w , w , w } , W = { w , w , w } , W = { w , w , w } , W = { w , w , w } , W = { w , w , w } , W = { w , w , w } , e = sQK = 2 , and Q = { φ , φ } , Q = { φ , φ } , Q = { φ , φ } , Q = { φ , φ } , Q = { φ , φ } , Q = { φ , φ } , Q = { φ , φ } , Q = { φ , φ } , Q = { φ , φ } , Q = { φ , φ } . We divide the processes into e = 2 steps. • In the first step, node k , where k ∈ { , , . . . , } , is responsible for computing φ
10 = .IV. P
ERFORMANCE
From Theorem 1, we can directly obtain CDC schemes from known PDAs. We list some known results on g - ( K, N, Z, S ) PDAs in Table V, where t | K represents that t is a factor of K . The interested readers can be reffered to [2], [3], [15], [17]for more known results about PDAs . A. The first new scheme
Let P be a ( t +1) - ( K, (cid:0) Kt (cid:1) , (cid:0) K − t − (cid:1) , (cid:0) Kt +1 (cid:1) ) PDA from [12] (the PDA in the second row of Table V). From Theorem 1, for anypositive integer s ≤ K , one can obtain a CDC scheme, say Scheme 1, with K nodes, N = (cid:0) Kt (cid:1) files and Q = K gcd ( K,s ) outputfunctions, where s is corresponding to the number of nodes that compute each output function. Furthermore, the computationload is r = KZN = K (cid:0) K − t − (cid:1)(cid:0) Kt (cid:1) = t, TABLE V: Some known results on PDAs
References and Parameters g K N Z S [12],
K, t ∈ N + with ≤ t ≤ K − t + 1 K (cid:0) Kt (cid:1) (cid:0) K − t − (cid:1) (cid:0) Kt +1 (cid:1) [17], K, t ∈ N + with t ≥ and t | K t K ( Kt ) t − ( Kt ) t − ( Kt ) t − ( Kt ) t − [17], K, t ∈ N + with t ≥ and t | K K − t K ( Kt − Kt ) t − ( Kt − ( Kt ) t − ( Kt ) t − and the communication load is L ( r, s ) = sgS ( g − KN = s ( t + 1) (cid:0) Kt +1 (cid:1) (( t + 1) − K (cid:0) Kt (cid:1) = st (1 − tK ) = sr (1 − rK ) . Note that if s ≥ rKK − r , we have L ( r, s ) = sr (1 − rK ) ≥ . In this case, the nodes do not need to transmit coded messages.Instead, each IVA can be multicasted by one node which can compute the IVA locally. By using this method, the communicationload L ( r, s ) = 1 . So the communication load is L ( r, s ) = min { sr (1 − rK ) , } .
1) Optimality:
When s = 1 and ≤ r < K − , (1) can be written as L ∗ ( r,
1) = min { r +1 ,K } (cid:88) l =max { r +1 , } (cid:0) K − rK − l (cid:1)(cid:0) rl − (cid:1)(cid:0) K (cid:1) l − rl − min { r +1 } (cid:88) l =max { r +1 } (cid:0) K − rK − l (cid:1)(cid:0) rl − (cid:1)(cid:0) K (cid:1) l − rl −
1= 1 r (1 − rK ) Obviously, the communication load L ( r,
1) = r (1 − rK ) of Scheme 1 achieves the optimal computation-communicationtrade-off. Remark 2:
In this case, one can directly check that our scheme is the same as the scheme with s = 1 proposed in [9].When ≤ s ≤ K and r = K − , from (1), we have L ∗ ( K − , s ) = min { K − s,K } (cid:88) l =max { K − ,s } (cid:0) K − ( K − K − l (cid:1)(cid:0) K − l − s (cid:1)(cid:0) Ks (cid:1) l − ( K − l − min { K } (cid:88) l =max { K } (cid:0) K − ( K − K − l (cid:1)(cid:0) K − l − s (cid:1)(cid:0) Ks (cid:1) l − ( K − l − sK − − K − K )= sr (1 − rK ) So in this case, the communication load of Scheme 1 achieves the optimal computation-communication trade-off.
Remark 3:
One can show that the number of output functions in Scheme 1 much smaller than that of output functions inLi-CDC scheme. The number of output functions in Li-CDC scheme is Q Li = (cid:0) Ks (cid:1) , while the number of output functions inScheme 1 is Q = K gcd( K,s ) . For example, if s = K/w where w is a factor of K, then Q Li = (cid:0) KK/w (cid:1) and Q = K gcd( K,K/w ) = w ,respectively. We list the cases w = 2 and w = 3 when ≤ K ≤ in Table VI and in Table VII, respectively.TABLE VI: The numbers of output functions in Li-CDC scheme and Scheme 1 when K is even and s = K Number ofNode
K Q Li = (cid:0) Ks (cid:1) Q = K gcd( K,s ) K Q Li Q
2) Comparison:
For the other values of K , r and s , we conjecture that H ( r, s ) = L ( r,s ) L ∗ ( r,s ) ≤ . Unfortunately, the structureof the formula L ∗ ( r, s ) = min { r + s,K } (cid:80) l =max { r +1 ,s } ( K − rK − l )( rl − s )( Ks ) l − rl − is too complex to prove this conjecture. However, we could find outsome values of K , r and s satisfying that H ( r, s ) = L ( r,s ) L ∗ ( r,s ) ≤ . TABLE VII: The numbers of reduce functions in Li-CDC scheme and Scheme 1 when | K and s = K Number ofNode
K Q Li = (cid:0) Ks (cid:1) Q = K gcd( K,s ) K Q Li Q Theorem 2:
For any positive integers K , r and s with r, s ≤ K ≤ , H ( r, s ) = L ( r,s ) L ∗ ( r,s ) ≤ . Proof.
With the aid of a computer, one can find out that H ( r, s ) ≤ holds for all the positive integers K , r and s with r, s ≤ K ≤ . Here we only list some cases in Table VIII. For the other cases, we omit it and the interested readers maycontact the author for a copy. TABLE VIII: The ratio of L ( r, s ) to L ∗ ( r, s ) with K = 16 ComputationLoad r ReplicationFactor s CommunicationLoad L ∗ ( r, s ) CommunicationLoad L ( r, s ) H ( r, s ) = L ( r,s ) L ∗ ( r,s ) . . . . . . . . . . . . . . . . . . . . . . . . . . . Remark 4:
For the parameters
K, r and s satisfying the conditions in Theorem 2, the communication loads of Scheme1 are slightly lager than those of Li-CDC scheme. However, similar to Remark 3, one can show that the number of outputfunctions in Scheme 1 much smaller than that of output functions in Li-CDC scheme.We also prove there exist some values of K , r and s such that H ( r, s ) = L ( r,s ) L ∗ ( r,s ) ≤ by using theoretical analysis. Lemma 4:
For any positive integers K , r and s with r ≥ s and K ≥ rs (7 r − s +1)8( r − s +1) , H ( r, s ) = L ( r,s ) L ∗ ( r,s ) ≤ .The proof of Lemma 4 could be found in Appendix A.By using Lemma 4, we can provide a method to show H ( r, s ) ≤ for some positive integers s , r and K . More specifically,given positive integers r and s , one can find out an integer K ( r, s ) , such that H ( r, s ) ≤ if K ≥ K ( r, s ) . For K < K ( r, s ) ,with the aid of a computer, one may check that whether H ( r, s ) ≤ holds. We take the following result as an example. Theorem 3:
Suppose that K , r and s are positive integers. If s ≤ r ≤ and r ≤ K , then H ( r, s ) = L ( r,s ) L ∗ ( r,s ) ≤ . Proof.
We divided the proof into two parts.1) ( r, s ) (cid:54) = (8 , . We take ( r, s ) = (2 , as an example. According to Lemma 4, for any positive integer K ≥ rs (7 r − s +1)8( r − s +1) =19 . , H ( r, s ) ≤ holds. For any positive integer K < . , we can obtain H ( r, s ) ≤ from Theorem 2. Similarly, wecan prove H ( r, s ) ≤ holds for any other pairs ( r, s ) .2) ( r, s ) = (8 , . According to Lemma 4, for any positive integer K ≥ rs (7 r − s +1)8( r − s +1) = 1176 , H ( r, s ) ≤ holds. For anypositive integer K ≤ , we can obtain H ( r, s ) ≤ from Theorem 2. For any positive integer < K < ,with the aid of a computer, one can show that H ( r, s ) ≤ holds.This completes the proof. Remark 5:
It is easy to see that for all the parameters satisfying the conditions in Theorem 3, the communication loadsof Scheme 1 are slightly lager than those of Li-CDC scheme. while the number of output functions in Scheme 1 much smallerthan that of output functions in Li-CDC scheme. Here we only list some some cases in Table IX, where Q Li and Q are thenumbers of output functions in Li-CDC scheme and Scheme 1, respectively. TABLE IX: The numbers of output functions in Li-CDC scheme and Scheme 1
Number ofNode K ComputationLoad r ReplicationFactor s Q Li = (cid:0) Ks (cid:1) Q = K gcd( K,s ) B. The second new scheme
Let P be a t - ( K, ( Kt ) t − , ( Kt ) t − , ( Kt ) t − ( Kt ) t − ) PDA from [17] (the PDA in the third row of Table V). From Theorem 1, forany positive integer s ≤ K , one can obtain a CDC scheme, say Scheme 2, with K nodes, N = ( Kt ) t − files and Q = K gcd ( K,s ) output functions, where s is corresponding to the number of nodes that compute each output function. Furthermore, thecomputation load is r = KZN = K ( Kt ) t − ( Kt ) t − = t, and the communication load is L ( r, s ) = sgS ( g − KN = st (( Kt ) t − ( Kt ) t − )( t − K ( Kt ) t − = s ( K − t )( t − K = sr − − rK ) . Similar to the communication load of Scheme 1, it is possible that L = sr − (1 − rK ) > . By using similar method,we could have L ( r, s ) = min { sr − (1 − rK ) , } . Obviously, the communication load in Scheme 2 is slight larger than thecommunication load in Scheme 1, i.e., L = sr − (1 − rK ) > sr (1 − rK ) = L . Hence, similar to Scheme 1, we can also provethe following results. Theorem 4:
Suppose that K ≥ , r and s are positive integers satisfying that K ≥ s and r ≥ is a factor of K . Then H ( r, s ) = L ( r,s ) L ∗ ( r,s ) ≤ . holds if one of the following conditions is satisfied:1) K ≤ r ≥ s + 2 and K ≥ (111 r − s − rs r − s − ;3) s + 2 ≤ r ≤ .The proof of Theorem 4 is included in Appendix B.Furthermore, we can show that the number of files in Scheme 2 is much smaller than that of files in Li-CDC scheme. Todo this, we need the following lemma. Lemma 5: ([17]) For fixed rational number a ∈ (0 , , let K ∈ N + such that aK ∈ N + , when K → ∞ , (cid:18) KaK (cid:19) ∼ e K ( a ln a +(1 − a ) ln − a ) (cid:112) πK ( a − a ) . From Lemma 1, the number of files in Li-CDC scheme is (cid:0)
KaK (cid:1) , where aK = r . So according to Lemma 5, the number offiles in Li-CDC scheme is (cid:18) KaK (cid:19) ∼ e K ( a ln a +(1 − a ) ln − a ) (cid:112) πK ( a − a ) when K → ∞ . On the other hand, the number of files in Scheme 2 is ( Kr ) r − , which is exponentially smaller in K than thenumber of files in Li-CDC scheme. Remark 6:
From Theorem 4, the communication load of Scheme 2 is slight lager than that of Li-CDC scheme. However,the number K gcd( K,s ) of output functions in Scheme 2 is much smaller than the number (cid:0) Ks (cid:1) of output functions in Li-CDC scheme. Furthermore, according to the above discussions, the number of files in Scheme 2 is exponentially smaller in K thanthe number of files in Li-CDC scheme. C. The third new scheme
Let P be a ( K − t ) - ( K, ( Kt − Kt ) t − , ( Kt − ( Kt ) t − , ( Kt ) t − ) PDA from [17] (the PDA in the forth row ofTable V). From Theorem 1, for any positive integer s ≤ K , one can obtain a CDC scheme, say Scheme 3, with K nodes, N = ( Kt − Kt ) t − files and Q = K gcd ( K,s ) output functions, where s is corresponding to the number of nodes that computeeach reduce function. Furthermore, the computation load is r = KZN = K ( Kt − ( Kt ) t − ( Kt − Kt ) t − = K − t, and the communication load is L ( r, s ) = sgS ( g − KN = s ( K − t )( Kt ) t − ( K − t − K ( Kt − Kt ) t − = st ( K − t − K = s ( K − r )( r − K = sr − − rK ) . We note that the communication load of Scheme 3 is equal to that of Scheme 2, i.e., L = sr − (1 − rK ) = L , which impliesthat H ( r, s ) = L ( r,s ) L ∗ ( r,s ) = L ( r,s ) L ∗ ( r,s ) = H ( r, s ) . So we can prove the following results by using the same method as the proofof Theorem 4. Theorem 5:
Suppose that K ≥ , r and s are positive integers satisfying that s ≤ K , r ≤ K − and K − r is a factorof K . Then H ( r, s ) = L ( r,s ) L ∗ ( r,s ) ≤ . holds if one of the following conditions is satisfied:1) K ≤ r ≥ s + 2 and K ≥ (111 r − s − rs r − s − ;Since the proof of Theorem 5 is the same as the proof of Theorem 4-1) and 2), we omit it here. The difference betweenTheorem 4 and Theorem 5 is the range of the computation load r . In Theorem 4, r is a factor of K , while K − r is a factorof K in Theorem 5. Remark 7:
Similar to the discussions of Scheme 2, the communication load of Scheme 3 is slight lager than that ofLi-CDC scheme. However, the number of output functions in Scheme 3 is much smaller than the number of output functionsin Li-CDC scheme, and the number of files in Scheme 3 is exponentially smaller in K than the number of files in Li-CDCscheme. V. C ONCLUSION
In this paper, we paid our attention to cascaded CDC schemes on homogeneous computing networks. We showed that,comparing the well known Li-CDC scheme, our new schemes have not only smaller number of input files, but also smallernumber of output functions. It is worth noting that the number of output functions in our new scheme are only a factor of thenumber of computing nodes. R
EFERENCES[1] “Apache Hadoop.” [Online]. Available: http://hadoop.apache.org/[2] M. Cheng, J. Jiang, Q. Wang, and Y. Yao, “A generalized grouping scheme in coded caching,”
IEEE Trans. Commun. , vol. 67, no. 5, pp. 3422-3430,May 2019.[3] M. Cheng, J. Jiang, Q. Yan, and X.Tang, “Coded caching schemes for flexible memory sizes,”
IEEE Trans. Commun. , vol. 67, no. 6, pp. 4166-4176, Jun.2019.[4] J. Dean and S. Ghemawat, “Mapreduce: simplified data processing on large clusters,”
Commun. ACM , vol. 51, no. 1, pp. 107-113, Jan. 2008.[5] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in IEEE Conference on Computer Vision and Pattern Recognition(CVPR), Las Vegas, USA, June 2016, pp. 770-778. [6] K. Konstantinidis and A. Ramamoorthy, “Resolvable Designs for Speeding up Distributed Computing,” arXiv:1908.05666v1, 2019.[7] K. Lee, M. Lam, R. Pedarsani, D. Papailiopoulos, and K. Ramchandran, “Speeding up distributed machine learning using codes,” IEEE Trans. Inform.Theory , vol. 64, no. 3, pp. 1514-1529, Mar. 2018.[8] K. Lee, C. Suh, and K. Ramchandran, “High-dimensional coded matrix multiplication,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT), Aachen, Germany,pp. 2418-2422, Jun. 2017.[9] S. Li, M. A. Maddah-Ali, Q. Yu, and A. S. Avestimehr, “A fundamental tradeoff between computation and communication in distributed computing,”
IEEE Trans. Inform. Theory , vol. 64, no. 1, pp. 109-128, Jan. 2018.[10] S. Li, Q. Yu, M. A. Maddah-Ali, and A. S. Avestimehr, “A scalable framework for wireless distributed computing,”
IEEE/ACM Trans. Netw. , vol. 25,no. 5, pp. 2643-2654, Oct. 2017.[11] S. Li, Q. Yu, M. A. Maddah-Ali, and A. S. Avestimehr, “Edge-facilitated wireless distributed computing,” in Proc. IEEE Glob. Commun. Conf. (Globlcom),Washington, DC, USA, Dec. 2016.[12] M. A. Maddah-Ali and U. Niesen, “Fundamental limits of caching,”
IEEE Trans. Inform. Theory , vol. 60, no. 5, pp. 2856-2867, May 2014.[13] H. Park, K. Lee, J. Sohn, C. Suh, and J. Moon, “Hierarchical coding for distributed computing,” arXiv:1801.04686.[14] S. Prakash, A. Reisizadeh, R. Pedarsani, and A. S. Avestimehr, “Coded computing for distributed graph analytics,” in Proc. IEEE Int. Symp. Inf. Theory(ISIT), Vail, CO, June 2018, pp. 1221-1225.[15] C. Shangguan, Y. Zhang, and G. Ge, “Centralized coded caching schemes: A hypergraph theoretical approach,”
IEEE Trans. Inform. Theory , vol. 64,no. 8, pp. 5755-5766, Aug. 2018.[16] N. Woolsey, R. Chen, and M. Ji, “A new combinatorial design of coded distributed computing,” in 2018 in Proc. IEEE Int. Symp. Inf. Theory (ISIT),Vail, CO, June 2018, pp. 726-730.[17] Q. Yan, M. Cheng, X. Tang, and Q. Chen, “On the placement delivery array design in centralized coded caching scheme,”
IEEE Trans. Inform. Theory ,vol. 63, no. 9, pp. 5821-5833, Sep. 2017.[18] Q. Yan, S. Sheng, and M. Wigger, “Storage, computation, and communication: A fundamental tradeoff in distributed computing,” arXiv: 1806:07565.[19] Q. Yan, S. Yang, and M. Wigger, “Storage, computation, and communication: A fundamental tradeoff in distributed computing,” in IEEE ITW, Guangzhou,China, Nov. 2018.[20] Q. Yu, M. Maddah-Ali, and S. Avestimehr, “Polynomial codes: An optimal design for high-dimensional coded matrix multiplication,” in Proc. The 31stAnnual Conf. Neural Inf. Processing System (NIPS), Long Beach, CA, USA, May 2017.[21] M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica, “Spark: Cluster computing with working sets,” HotCloud, vol. 10, no.10-10, pp.95, 2010. A PPENDIX
A:P
ROOF OF L EMMA K , r and s satisfy r ≥ s and K ≥ rs (7 r − s +1)8( r − s +1) .Firstly, consider the communication L ∗ ( r, s ) of Li-CDC scheme. Since K ≥ rs (7 r − s + 1)8( r − s + 1) = 3 rs s − r − s + 1 ) ≥ rs > r + s, (11)we have L ∗ ( r, s ) = min { r + s,K } (cid:88) l =max { r +1 ,s } (cid:0) K − rl − r (cid:1)(cid:0) rl − s (cid:1)(cid:0) Ks (cid:1) l − rl −
1= 1 (cid:0) Ks (cid:1) r + s (cid:88) l =max { r +1 ,s } (cid:32) K − rl − r (cid:33)(cid:32) rl − s (cid:33) l − rl − ≥ (cid:0) Ks (cid:1) (cid:32) K − rs (cid:33)(cid:32) rr (cid:33) sr + s − s ( K − r )( K − r − · · · ( K − r − s + 1)( r + s − K ( K − · · · ( K − s + 1) . Then H ( r, s ) = L ( r, s ) L ∗ ( r, s ) ≤ s ( K − r ) Kr ( r + s − K ( K − · · · ( K − s + 1) s ( K − r )( K − r − · · · ( K − r − s + 1)= r + s − r ( K − · · · ( K − ( s − K − ( r + 1)) · · · ( K − ( r + s − . (12) In order to evaluate the above value of H ( r, s ) , the following lemma is needed. Lemma 6:
Suppose that K, a , a , . . . , a n are positive integers with K > a i , ≤ i ≤ n , and ( K − a )( K − a ) · · · ( K − a n ) = K n − b K n − + b K n − + . . . + ( − n − b n − K + ( − n b n . For any ≤ h ≤ n − ,1) b h +1 ≤ (cid:80) ≤ i ≤ n a i h +1 b h ;2) b h +1 K n − h − ≤ b h K n − h holds if K ≥ (cid:80) ≤ i ≤ n a i h +1 . Proof.
Firstly, we will prove the first result. It is not difficult to know that b h = (cid:80) ≤ i − r + r ) sK = − rsK. Since rs < K from (11), s ( s − ≤ r s < rsK. Hence H ( r, s ) < r + s − r (1 + 8 rsK + rsK K − rsK )= r + s − r (1 + 9 rs K − rs ) ≤ r + s − r (1 + 9 rs rs (7 r − s +1)8( r − s +1) − rs )= 2 , where the second formula from last holds since K ≥ rs (7 r − s +1)8( r − s +1) .This completes the proof. A PPENDIX
B: P
ROOF OF T HEOREM H ( r, s ) ≤ . holds for all the positive integers satisfying condition 1) ofTheorem 4.Similar to the processes obtaining (15), one can obtain that H ( r, s ) ≤ r + s − r − r ( s − K + s ( s − K − r + s )( s − K ) Since r ≥ s + 2 and K ≥ (111 r − s − rs r − s − , we have rs < K . Then s ( s − ≤ r s < rsK. We also obtain that r ( s − K < rsK, − r + s )( s − K > − r + r ) sK = − rsK. Hence H ( r, s ) < r + s − r − rsK + rsK K − rsK )= r + s − r − rs K − rs ) < r + s − r − rs (111 r − s − rs r − s − − rs )= 2 . So H ( r, s ) ≤ . holds for all the positive integers satisfying condition 2) of Theorem 4.We now consider the condition 3) of Theorem 4. We take ( r, s ) = (4 , as an example. According to Theorem 4-2) forany positive integer K ≥ (111 r − s − rs r − s − = 46 . , H ( r, s ) ≤ . holds. For any positive integer K ≤ , we can obtain H ( r, s ) ≤ . from Theorem 4-1). Similarly, we can prove H ( r, s ) ≤ . holds for any other pairs ( r, s ))