Private and Rateless Adaptive Coded Matrix-Vector Multiplication
Rawad Bitar, Yuxuan Xing, Yasaman Keshtkarjahromi, Venkat Dasari, Salim El Rouayheb, Hulya Seferoglu
aa r X i v : . [ c s . I T ] S e p Private and Rateless Adaptive Coded Matrix-VectorMultiplication
Rawad Bitar, Yuxuan Xing, Yasaman Keshtkarjahromi, Venkat Dasari, Salim El Rouayheb, and Hulya Seferoglu
Abstract —Edge computing is emerging as a new paradigm toallow processing data near the edge of the network, where thedata is typically generated and collected. This enables criticalcomputations at the edge in applications such as Internet ofThings (IoT), in which an increasing number of devices (sensors,cameras, health monitoring devices, etc.) collect data that needs tobe processed through computationally intensive algorithms withstringent reliability, security and latency constraints.Our key tool is the theory of coded computation, whichadvocates mixing data in computationally intensive tasks byemploying erasure codes and offloading these tasks to otherdevices for computation. Coded computation is recently gaininginterest, thanks to its higher reliability, smaller delay, andlower communication costs. In this paper, we develop a privateand rateless adaptive coded computation (PRAC) algorithm fordistributed matrix-vector multiplication by taking into account(i) the privacy requirements of IoT applications and devices, and(ii) the heterogeneous and time-varying resources of edge devices.We show that PRAC outperforms known secure coded computingmethods when resources are heterogeneous. We provide theoret-ical guarantees on the performance of PRAC and its comparisonto baselines. Moreover, we confirm our theoretical results throughsimulations and implementations on Android-based smartphones.
I. I
NTRODUCTION
Edge computing is emerging as a new paradigm to allowprocessing data near the edge of the network, where the datais typically generated and collected. This enables computationat the edge in applications such as Internet of Things (IoT),in which an increasing number of devices (sensors, cameras,health monitoring devices, etc.) collect data that needs to beprocessed through computationally intensive algorithms withstringent reliability, security and latency constraints.One of the promising solutions to handle computationallyintensive tasks is computation offloading, which advocatesoffloading tasks to remote servers or cloud. Yet, offloadingtasks to remote servers or cloud could be luxury that cannot beafforded by most of the edge applications, where connectivityto remote servers can be lost or compromised, which makesedge computing crucial.
The preliminary results of this paper are presented in part at the SPIEDefense + Commercial Sensing, Baltimore, MD, April 2019 [1].R. Bitar and S. El Rouayheb are with the Department of Electricaland Computer Engineering at Rutgers, the State University of New Jersey,Piscataway, New Jersey, 08854. E-mails: [email protected] , [email protected] .Y. Xing, and H. Seferoglu are with the Department of Electrical andComputer Engineering, University of Illinois at Chicago, Chicago, IL, 60607.E-mails: [email protected] , [email protected] .Y. Keshtkarjahromi is with the Storage Research Groupat the Seagate Technology, Shakopee, MN, 55379. E-mail: [email protected] .V. Dasari is with the US Army Research Lab, Aberdeen Proving Ground,MD. Edge computing advocates that computationally intensivetasks in a device (master) could be offloaded to other edge orend devices (workers) in close proximity. However, offloadingtasks to other devices leaves the IoT and the applications it issupporting at the complete mercy of an attacker. Furthermore,exploiting the potential of edge computing is challengingmainly due to the heterogeneous and time-varying natureof the devices at the edge. Thus, our goal is to develop aprivate, dynamic, adaptive, and heterogeneity-aware cooper-ative computation framework that provides both privacy andcomputation efficiency guarantees. Note that the applicationof this work can be extended to cloud computing at remotedata-centers. However, we focus on edge computing as het-erogeneity and time-varying resources are more prevalent atthe edge as compared to data-centers.Our key tool is the theory of coded computation, whichadvocates mixing data in computationally intensive tasks byemploying erasure codes and offloading these tasks to otherdevices for computation [2]–[14]. The following canonicalexample demonstrates the effectiveness of coded computation.
Example 1.
Consider the setup where a master device wishesto offload a task to 3 workers. The master has a large datamatrix A and wants to compute matrix vector product A x . Themaster device divides the matrix A row-wise equally into twosmaller matrices A and A , which are then encoded usinga (3 , Maximum Distance Separable (MDS) code to give B = A , B = A and B = A + A , and sends each to adifferent worker. Also, the master device sends x to workersand ask them to compute B i x , i ∈ { , , } . When the masterreceives the computed values (i.e., B i x ) from at least two outof three workers, it can decode its desired task, which is thecomputation of A x . The power of coded computations is thatit makes B = A + A act as a “joker” redundant task thatcan replace any of the other two tasks if they end up stragglingor failing. ✷ The above example demonstrates the benefit of coding foredge computing. However, the very nature of task offloadingfrom a master to worker devices makes the computationframework vulnerable to attacks. One of the attacks, which isalso the focus of this work, is eavesdropper adversary , whereone or more of workers can behave as an eavesdropper and can An ( n, k ) MDS code divides the master’s data into k chunks and encodesit into n chunks ( n > k ) such that any k chunks out of n are sufficient torecover the original data. TABLE I: Example PRAC operation in heterogeneous andtime-varying setup.Time Worker 1 Worker 2 Worker 3 R A + A + R A + R R A + A + R A + R spy on the coded data sent to these devices for computations. For example, B = A + A in Example 1 can be processedand spied by worker 3. Even though A + A is coded, theattacker can infer some information from this coded task.Privacy against eavesdropper attacks is extremely importantin edge computing [15]–[17]. Thus, it is crucial to developa private coded computation mechanism against eavesdropperadversary who can gain access to offloaded tasks.In this paper, we develop a private and rateless adaptivecoded computation (PRAC) mechanism. PRAC is (i) privateas it is secure against eavesdropper adversary, (ii) rateless,because it uses Fountain codes [18]–[20] instead of MaximumDistance Separable (MDS) codes [21], [22], and (iii) adaptiveas the master device offloads tasks to workers by taking intoaccount their heterogeneous and time-varying resources. Next,we illustrate the main idea of PRAC through an illustrativeexample. Example 2.
We consider the same setup in Example 1, wherea master device offloads a task to 3 workers. The master hasa large data matrix A and wants to compute matrix vectorproduct A x . The master device divides matrix A row-wiseinto sub-matrices A , A , A ; and encodes these matricesusing a Fountain code [18]–[20]. An example set of codedpackets is A , A , A + A , and A + A . However, prior tosending a coded packet to a worker, the master generates arandom key matrix R with the same dimensions as A i and withentries drawn uniformly from the same alphabet as the entriesof A . The key matrix is added to the coded packets to provideprivacy as shown in Table I. In particular, a key matrix R iscreated at the start of time slot , combined with A + A and A , and transmitted to workers and , respectively. R isalso transmitted to worker in order to obtain R x that willhelp the master in the decoding process. The computation of ( A + A + R ) x is completed at the end of time slot . Thus,at that time slot the master generates a new matrix, R , andsends it to worker . At the end of time slot , worker finishesits computation, therefore the master adds R to A + A and sends it to worker . A similar process is repeated atthe end of time slot . Now the master waits for worker toreturn R x and for any other worker to return its uncompletedtask in order to decode A x . Thanks to using key matrices R and R , and assuming that workers do not collude, privacy is Note that this work focuses specifically on eavesdropper adversary al-though there are other types of attacks; for example
Byzantine adversary ,which is out of scope of this work. Fountain codes are desirable here for two properties: (i) they provide afluid abstraction of the coded packets so the master can always decode withhigh probability as long as it collects enough packets; (ii) They have lowdecoding complexity. guaranteed. On a high level, privacy is guaranteed becausethe observation of the workers is statistically independent fromthe data A . ✷ This example shows that PRAC can take advantage ofcoding for computation, and provide privacy.
Contributions.
We design PRAC for heterogeneous andtime-varying private coded computing with colluding workers.In particular, PRAC codes sub-tasks using Fountain codes,and determines how many coded packets and keys eachworker should compute dynamically over time. We providetheoretical analysis of PRAC and show that it (i) guaranteesprivacy conditions, (ii) uses minimum number of keys tosatisfy privacy requirements, and (iii) maintains the desiredrateless property of non-private Fountain codes. Furthermore,we provide a closed form task completion delay analysis ofPRAC. Finally, we evaluate the performance of PRAC viasimulations as well as in a testbed consisting of real Android-based smartphones as compared to baselines.The use of Fountain codes in encoding the sub-tasks pro-vides PRAC flexibility on the number of stragglers and onthe computing capacity of workers, reflected by the numberof sub-tasks assigned to each worker. In contrast, existingsolutions for secure coded computing require the master to seta threshold on the number of stragglers that it can tolerate andpre-assign the sub-tasks to the workers based on this threshold.
Organization.
The structure of the rest of this paper isas follows. We start with presenting the system model inSection II. Section III presents the design of private andrateless adaptive coded computation (PRAC). We characterizeand analyze PRAC in Section IV. We present evaluation resultsin section V. Section VI presents related work. Section VIIconcludes the paper.II. S
YSTEM M ODEL
Setup.
We consider a master/workers setup at the edgeof the network, where the master device M offloads itscomputationally intensive tasks to workers w i , i ∈ N , (where |N | = n ) via device-to-device (D2D) links such as Wi-FiDirect and/or Bluetooth. The master device divides a task intosmaller sub-tasks, and offloads them to workers that processthese sub-tasks in parallel. Task Model.
We focus on the computation of linear func-tions, i.e., matrix-vector multiplication. We suppose the mas-ter wants to compute the matrix vector product A x , where A ∈ F m × ℓq can be thought of as the data matrix and x ∈ F ℓq can be thought of as an attribute vector. We assume that theentries of A and x are drawn independently and uniformlyat random from F q . The motivation stems from machinelearning applications where computing linear functions is abuilding block of several iterative algorithms [23], [24]. Forinstance, the main computation of a gradient descent algorithmwith squared error loss function is x + = x − αA T ( A x − y ) , (1) We abuse notation and denote both the random matrix representing thedata and its realization by A . We do the same for x . where x is the value of the attribute vector at a given iteration, x + is the updated value of x at this iteration and the learningrate α is a parameter of the algorithm. Equation (1) consists ofcomputing two linear functions A x and A T w , A T ( A x − y ) . Worker and Attack Model.
The workers incur random delayswhile executing the task assigned to them by the master device.The workers have different computation and communicationspecifications resulting in a heterogeneous environment whichincludes workers that are significantly slower than others,known as stragglers. Moreover, the workers cannot be trustedwith the master’s data. We consider an eavesdropper adversary in this paper, where one or more of workers can be eaves-droppers and can spy on the coded data sent to these devicesfor computations. We assume that up to z , z < n , workerscan collude, i.e., z workers can share the data they receivedfrom the master in order to obtain information about A . Theparameter z can be chosen based on the desired privacy level;a larger z means a higher privacy level and vice versa. Onewould want to set z to the largest possible value for maximum, z = n − security purposes. However, this has the drawbackof increasing the complexity and the runtime of the algorithm.In our setup we assume that z is a fixed and given systemparameter. Coding & Secret Keys.
The matrix A can be divided into b row blocks (we assume that b divides m , otherwise all-zero rows can be added to the matrix to satisfy this property)denoted by A i , i = 1 , . . . , b . The master applies Fountain cod-ing [18]–[20] across row blocks to create information packets ν j , P mi =1 c i,j A i , j = 1 , , . . . , where the c i,j ∈ { , } . Notethat an information packet is a matrix of dimension m/b × ℓ ,i.e., ν j ∈ F m/b × ℓq . Such rateless coding is compatible withour goal to create adaptive coded cooperation computationframework. In order to maintain privacy of the data, the masterdevice generates random matrices R i of dimension m/b × ℓ called keys . The entries of the R i matrices are drawn uniformlyat random from the same field as the entries of A . Eachinformation packet ν j is padded with a linear combination of z keys f j ( R i, , . . . , R i,z ) to create a secure packet s j ∈ F m/b × ℓq defined as s j , ν j + f j ( R i, , . . . , R i,z ) .The master device sends x to all workers, then it sendsthe keys and the s j ’s to the workers according to our PRACscheme described later. Each worker multiplies the receivedpacket by x and sends the result back to the master. Since theencoding is rateless, the master keeps sending packets to theworkers until it can decode A x . The master then sends a stopmessage to all the workers. Privacy Conditions.
Our primary requirement is that anycollection of z (or less) workers will not be able to obtain anyinformation about A , in an information theoretic sense.In particular, let P i , i = 1 . . . , n , denote the collection ofpackets sent to worker w i . For any set B ⊆ { , . . . , n } , let P B , { P i , i ∈ B} denote the collection of packets given toworker w i for all i ∈ B . The privacy requirement can be In some cases the vector x may contain information about A and thereforemust not be revealed to the workers. We explain in Appendix A how togeneralize our scheme to account for such cases. TABLE II: Summary of notations.Symbol Meaning M master w i worker in number of workers A m × ℓ data matrix x ℓ × attribute vector z number of colluding workers m number of rows in Aε overhead of Fountain codes A i i th row block of data matrix AR random matrix RT T i average round trip time to send and receive packet p i β t,i computation time of the t th packet at w i ν Fountain coded packet of A i ’s s secure Fountain coded packet T i time to compute a packet at w i T ( d ) d th order statistic of T i ’s T time spent by M to decode A x expressed as H ( A | P Z ) = H ( A ) , ∀Z ⊆ { , . . . , n } s.t. |Z| ≤ z. (2) H ( A ) denotes the entropy, or uncertainty, about A and H ( A | P Z ) denotes the uncertainty about A after observing P Z . Delay Model.
Each packet transmitted from the masterto a worker w i , i = 1 , , ..., n, experiences the followingdelays: (i) transmission delay for sending the packet from themaster to the worker, (ii) computation delay for computingthe multiplication of the packet by the vector x , and (iii)transmission delay for sending the computed packet fromthe worker w i back to the master. We denote by β t,i thecomputation time of the t th packet at worker w i and RT T i denotes the average round-trip time spent to send and receivea packet from worker w i . The time spent by the master isequal to the time taken by the ( z + 1) st fastest worker to finishits assigned tasks. III. D ESIGN OF
PRAC
A. Overview
We present the detailed explanation of PRAC. Let p t,i ∈ F m/b × ℓq be the t th packet sent to worker w i . This packet can beeither a key or a secure packet. For each value of t , the mastersends z keys denoted by R t, , . . . , R t,z to z different workersand up to n − z secure packets s t, , . . . , s t,n − z to the remainingworkers. The master needs the results of b + ǫ informationpackets, i.e., ν t,i x , to decode the final result A x , where ǫ is theoverhead required by Fountain coding . To obtain the results of b + ǫ information packets, the master needs the results of b + ǫ secure packets, s t,i x = ( ν i,j + f j ( R t,i , . . . , R t,z )) x , togetherwith all the corresponding R t,i x , i = 1 , . . . , z . Therefore,only the results of the s t,i x for which all the computed keys The overhead required by Fountain coding is typically as low as [20],i.e., ǫ = 0 . b Recall that f j ( R t, , . . . , R t,z ) is a linear function, thus it is easy toextract ( R t,i ) x , i = 1 , ..., z , from ( f j ( R t, , . . . , R t,z )) x . TABLE III: Depiction of PRAC in the presence of stragglers. The master keeps generating packets using Fountain codes untilit can decode A x . The master estimates the average task completion time of each worker and sends a new packet to avoididle time. Each new packet sent to a worker must be secured with a new random key. The master can decode A x , . . . , A x after receiving all the packets not having R , or R , in them. Time Worker Worker Worker Worker R , R , A + R , + R , A + A + A + R , + 2 R , R , R , A + R , + R , A + A + R , + 2 R , R , A + R , + R , R , R , A + R , + 2 R , R , A + A + R , + R , R t,i x , i = 1 , ..., z, are received by the master can account forthe total of b + ǫ information packets. B. Dynamic Rate Adaptation
The dynamic rate adaptation part of PRAC is based on [2].In particular, the master offloads coded packets gradually toworkers and receives two ACKs for each transmitted packet;one confirming the receipt of the packet by the worker, andthe second one (piggybacked to the computed packet) showingthat the packet is computed by the worker. Then, based on thefrequency of the received ACKs, the master decides to transmitmore/less coded packets to that worker. In particular, eachpacket p t,i is transmitted to each worker w i before or rightafter the computed packet p t − ,i x is received at the master.For this purpose, the average per packet computing time E [ β t,i ] is calculated for each worker w i dynamically based on thepreviously received ACKs. Each packet p t,i is transmitted afterwaiting E [ β t,i ] from the time p t − ,i is sent or right after packet p t − ,i x is received at the master, thus reducing the idle timeat the workers. This policy is shown to approach the optimaltask completion delay and maximizes the workers’ efficiencyand is shown to improve task completion time significantlycompared with the literature [2]. C. Coding
We explain the coding scheme used in PRAC. We startwith an example to build an intuition and illustrate the schemebefore going into details.
Example 3.
Assume there are n = 4 workers out of whichany z = 2 can collude. Let A and x be the data owned by themaster and the vector to be multiplied by A , respectively. Themaster sends x to all the workers. For the sake of simplicity,assume A can be divided into b = 6 row blocks, i.e., A = (cid:2) A T A T . . . A T (cid:3) T . The master encodes the A i ’s usingFountain code. We denote by round the event when the mastersends a new packet to a worker. For example, we say thatworker is at round if it has received packets so far. Forevery round t , the master generates z = 2 random matrices R t, , R t, (with the same size as A ) and encodes them usingan ( n, z ) = (4 , systematic maximum distance separable (MDS) code by multiplying R t, , R t, by a generator matrix G as follows G (cid:20) R t, R t, (cid:21) , (cid:20) R t, R t, (cid:21) . (3) This results in the encoded matrices of R t, , R t, , R t, + R t, , and R t, + 2 R t, . Now let us assume that workers canbe stragglers. At the beginning, the master initializes all theworkers at round . Afterwards, when a worker w i finishesits task, the master checks how many packets this worker hasreceived so far and how many other workers are at this round.If this worker w i is the first or second to be at round t , themaster generates R t, or R t, , respectively, and sends it to w i . Otherwise, if w i is the j th worker ( j > ) to be at round t , the master multiplies (cid:2) R t, R t, (cid:3) T by the j th row of G ,adds it to a generated Fountain coded packet, and sends it to w i . The master keeps sending packets to the workers until itcan decode A x . We illustrate the idea in Table III. We now explain the details of PRAC in the presence of z colluding workers.1) Initialization:
The master divides A into b row blocks A , . . . , A b and sends the vector x to the workers. Let G ∈ F n × zq , q > n , be the generator matrix of an ( n, z ) systematic MDS code. For example one may use system-atic Reed-Solomon codes that use Vandermonde matrix asgenerator matrix, see for example [25]. The master gener-ates z random matrices R , , . . . , R ,z and encodes themusing G . Each coded key can be denoted by g i R where g i is the i th row of G and R , (cid:2) R T , . . . R T ,z (cid:3) T .The master sends the z keys R , , . . . , R ,z to the first z workers, generates n − z Fountain coded packets of the A i ’s, adds to each packet an encoded random key g i R , i = z + 1 , . . . n , and sends them to the remaining n − z workers.2) Encoding and adaptivity:
When the master wants tosend a new packet to a worker (noting that a packet p t,i is transmitted to worker w i before, or right after,the computed packet p t − ,i x is received at the masteraccording to the strategy described in Section III-B), itchecks at which round this worker is, i.e., how manypackets this worker has received so far, and checks how many other workers are at least at this round. Assumeworker w i is at round t and j − other workers are at leastat this round. If j ≤ z , the master generates and sends R t,j to the worker. However, if j > z the master generatesa Fountain coded packet of the A i ’s (e.g., A + A ), addsto it g j R and sends the packet ( A + A + g j R ) to theworker. Each worker computes the multiplication of thereceived packet by the vector x and sends the result tothe master.3) Decoding and speed:
Let τ i denote the number of packetssent to worker i . We define τ max , max i τ i such thatat the end of the process the master has R t,i x for all t = 1 , . . . , τ max and all i = 1 , . . . , z . The master cantherefore subtract R t,i , t = 1 , . . . , τ max and i = 1 , . . . , z ,from all received secure information packets, and thuscan decode the A i ’s using the Fountain code decodingprocess. The number of secure packets that can be usedto decode the A i ’s is dictated by the ( z + 1) st fastestworker, i.e., the master can only use the results of secureinformation packets computed at a given round if at least z + 1 workers have completed that round. If for examplethe z fastest workers have completed round and the ( z + 1) st fastest worker has completed round , themaster can only use the packets belonging to the first rounds. The reason is that the master needs all thekeys corresponding to a given round in order to use thesecure information packet for decoding. In Lemma 2 weprove that this scheme is optimal, i.e., in private codedcomputing the master cannot use the packets computedat rounds finished by less than z + 1 workers irrespectiveof the coding scheme.IV. P ERFORMANCE A NALYSIS OF
PRAC
A. Privacy
In this section, we provide theoretical analysis of PRAC byparticularly focusing on its privacy properties.
Theorem 1.
PRAC is a rateless real-time adaptive coded com-puting scheme that allows a master device to run distributedlinear computation on private data A via n workers whilesatisfying the privacy constraint given in (2) for a given z < n .Proof. Since the random keys are generated independently ateach round, it is sufficient to study the privacy of the data onone round and the privacy generalizes to the whole algorithm.We show that for any subset Z ⊂ { , . . . , n } , | Z | = z , thecollection of packets p Z , { p t,i , i ∈ Z } sent at round t revealsno information about the data A as given in (2), i.e., H ( A ) = H ( A | p Z ) . Let K denote the random variable representing allthe keys generated at round t , then it is enough to show that H ( K | A, p Z ) = 0 as detailed in Appendix B. Therefore, weneed to show that given A as side information, any z workerscan decode the random keys R t, , . . . , R t,z . Without loss ofgenerality assume the workers are ordered from fastest toslowest, i.e., worker w is the fastest at the considered round t .Since the master sends z random keys to the fastest z workers,then p t,i = R t,i , i = 1 , . . . , z . The remaining n − z packets aresecure information packets sent to the remaining n − z workers, i.e., p t,i = s t,i = ν t,i + f ( R t, , . . . , R t,z ) , where ν t,i is alinear combination of row blocks of A and f ( R t, , . . . , R t,z ) is a linear combination of the random keys generated at round t . Given the data A as side information, any collection of z packets can be expressed as z codewords of the ( n, z ) MDScode encoding the random keys. Thus, given the matrix A ,any collection of z packets is enough to decode all the keysand H ( K | S, p Z ) = 0 which concludes the proof. Remark 1.
PRAC requires the master to wait for the ( z + 1) st fastest worker in order to be able to decode A x . We showin Lemma 2 that this limitation is a byproduct of all privatecoded computing schemes. Remark 2.
PRAC uses the minimum number of keys requiredto guarantee the privacy constraints. At each round PRACuses exactly z random keys which is the minimum amount ofrequired keys (c.f. Equation (12) in Appendix B). Lemma 2.
Any private coded computing scheme for dis-tributed linear computation limits the master to the speed ofthe ( z + 1) st fastest worker.Proof. The proof of Lemma 2 is provided in Appendix C.
B. Task Completion Delay
In this section, we characterize the task completion delayof PRAC and compare it with Staircase codes [3], which aresecure against eavesdropping attacks in a coded computationsetup with homogeneous resources. First, we start with taskcompletion delay characterization of PRAC.
Theorem 3.
Let b be the number of row blocks in A , let β t,i denote the computation time of the t th packet at worker w i andlet RT T i denote the average round-trip time spent to send andreceive a packet from worker i . The task completion time ofPRAC is approximated as T P RAC ≈ max i ∈{ ,...,n } { RT T i } + b + ǫ P ni = z +1 / E [ β t,i ] , (4) ≈ b + ǫ P ni = z +1 / E [ β t,i ] , (5) where w i ’s are ordered indices of the workers from fastest toslowest, i.e., w = arg min i E [ β t,i ] .Proof. The proof of Theorem 3 is provided in Appendix D.Now that we characterized the task completion delay ofPRAC, we can compare it with the state-of-the-art. Securecoded computing schemes that exist in the literature usuallyuse static task allocation, where tasks are assigned to workersa priori. The most recent work in the area is Staircase codes,which is shown to outperform all existing schemes that usethreshold secret sharing [3]. However, Staircase codes arestatic; they allocate fixed amount of tasks to workers a priori.Thus, Staircase codes cannot leverage the heterogeneity ofthe system, neither can it adapt to a system that is changingin time. On the other hand, our solution PRAC adaptivelyoffloads tasks to workers by taking into account the het-erogeneity and time-varying nature of resources at workers.
Therefore, we restrict our focus on comparing PRAC toStaircase codes.Staircase codes assigns a task of size b/ ( k − z ) row blocks toeach worker. Let T i be the time spent at worker i to computethe whole assigned task. Denote by T ( i ) the i th order statisticof the T i ’s and by T SC ( n, k, z ) the task completion time, i.e.,time the master waits until it can decode A x , when usingStaircase codes. In order to decode A x the master needs toreceive a fraction equal to ( k − z ) / ( d − z ) of the task assignedto each worker from any d workers where k ≤ d ≤ n . Thetask completion time of the master can then be expressed as[3] T SC ( n, k, z ) = min d ∈{ k,...,n } (cid:26) k − zd − z T ( d ) (cid:27) . (6) Theorem 4.
The gap between the completion time of PRACand coded computation using staircase codes is lower boundedby: E [ T SC ] − E [ T P RAC ] ≥ bx − ǫyy ( x + y ) , (7) where x = n − d ∗ E [ β t,n ] , y = d ∗ − zE [ β t,d ∗ ] and d ∗ is the value of d that minimizes equation (6) .Proof. The proof of Theorem 4 is provided in Appendix E.Theorem 4 shows that the lower bound on the gap betweensecure coded computation using staircase codes and PRAC isin the order of number of row blocks of A . Hence, the gapbetween secure coded computation using Staircase codes andPRAC is linearly increasing with the number of row blocksof A . Note that, ǫ , the required overhead by fountain codingused in PRAC, becomes negligible as b increases.Thus, PRAC outperforms secure coded computation usingStaircase codes in heterogeneous systems. The more heteroge-neous the workers are, the more improvement is obtained byusing PRAC. However, Staircase codes can slightly outperformPRAC in the case where the slowest n − z workers arehomogeneous, i.e., have similar compute service times T i . Inthis case both algorithms are restricted to the slowest n − z workers (see Lemma 2), but PRAC incurs an ǫ overhead oftasks (due to using Fountain codes) which is not needed forStaircase codes. In particular, from (5) and (6), when the n − z slowest workers are homogeneous, the task completion timeof PRAC and Staircase codes are equal to b + ǫn − z E [ β t,n ] and bn − z E [ β t,n ] , respectively.V. P ERFORMANCE E VALUATION
A. Simulations
In this section, we present simulations run on MATLAB,and compare PRAC with the following baselines: (i) Staircasecodes [3], (ii) C3P [2] (which is not secure as it is not designedto be secure), and (iii) Genie C3P (GC3P) that extends C3Pby assuming a knowledge of the identity of the eavesdroppers Note that in addition to n and z , all threshold secret sharing based schemesrequire a parameter k, z < k < n, which is the minimum number of nonstragglers that the master has to wait for before decoding A x . and ignoring them. We note that GC3P serves as a lowerbound on private coded computing schemes for heterogeneoussystems for the following reason: for a given number of z colluding workers the ideal coded computing scheme knowswhich workers are eavesdroppers and ignores them to usethe remaining workers without need of randomness. If theidentity of the colluding workers is unknown, coded com-puting schemes require randomness and become limited tothe ( z + 1) st fastest worker (Lemma 2). GC3P and othercoded computing schemes have similar performance if the z colluding workers are the fastest workers. If the z colludingworkers are the slowest, then GC3P outperforms any codedcomputing scheme. Note that our solution PRAC considersthe scenario of unknown eavesdroppers. Comparing PRACwith G3CP shows how good PRAC is as compared to thebest possible solution for heterogeneous systems. In terms ofcomparing PRAC to solutions designed for the homogeneoussetting, we restrict our attention to Staircase codes which area class of secret sharing schemes that enjoys a flexibility inthe number of workers needed to decode the matrix-vectormultiplication. Staircase codes are shown to outperform anycoded computing scheme that requires a threshold on thenumber of stragglers [3].In our simulations, we model the computation time of eachworker w i by an independent shifted exponential randomvariable with rate λ i and shift c i , i.e., F ( T i = t ) = 1 − exp( − λ i ( t − c i )) . We take c i = 1 /λ i and consider threedifferent scenarios for choosing the values of λ i ’s for theworkers as follows: • Scenario 1 : we assign λ i = 3 for half of the workers,then we assign λ i = 1 for one quarter of the workers andassign λ i = 9 for the remaining workers. • Scenario 2 : we assign λ i = 1 for one third of the workers,the second third have λ i = 3 and the remaining workershave λ i = 9 . • Scenario 3 : we draw the λ i ’s independently and uni-formly at random from the interval [0 . , .When running Staircase codes, we choose the parameter k that minimizes the task completion time for the desired n and z . We do so by simulating Staircase codes for all possiblevalues of k , z ≤ k ≤ n , and choose the one with the minimumcompletion time.We take b = m , i.e., each row block is simply a row of A .The size of each element of A and vector x are assumed tobe 1 Byte (or 8 bits). Therefore, the size of each transmittedpacket p t,i is ∗ ℓ bits. For the simulation results, we assumethat the matrix A is a square matrix, i.e., l = m . We take m = 1000 , unless explicitly stated otherwise. C i denotes theaverage channel capacity of each worker w i and is selecteduniformly from the interval [10 , Mbps. The rate of sendinga packet to worker w i is sampled from a Poisson distributionwith mean C i .In Figure 1 we show the effect of the number of rows m on the completion time at the master. We fix the number of If the system is homogeneous Staircase codes outperform GC3P, becausepre-allocating tasks to the workers avoids the overhead needed by Fountaincodes. , Number of rows in A A v e r a g ec o m p l e ti on ti m e Staircase codesPRACGC3P 1GC3P 2C3P (a) Scenario 1 with the fastest workersas eavesdropper for GC3P 1 and the slowestworkers as eavesdropper for GC3P 2. , Number of rows in A A v e r a g ec o m p l e ti on ti m e Staircase codesPRACGC3PC3P (b) Scenario 2 with workers picked atrandom to be eavesdroppers. , Number of rows in A A v e r a g ec o m p l e ti on ti m e Staircase codesPRACGC3PC3P (c) Scenario 3 with workers picked atrandom to be eavesdroppers. Fig. 1: Comparison between PRAC and the baselines Staircase codes, GC3P, and C3P in different scenarios with n = 50 workers and z = 13 colluding eavesdroppers for different values of the number of rows m . For each value of m we run experiments and average the results. When the eavesdropper are chosen to be the fastest workers, PRAC has very similarperformance to GC3P. When the eavesdroppers are picked randomly, the performance of PRAC becomes closer to this ofGC3P when the non adversarial workers are more heterogeneous.
20 40 60 80 100050100150200250300
Number of workers n A v e r a g ec o m p l e ti on ti m e Staircase codesPRACGC3PC3P (a) Task completion time as a function of the number of workerswith z = n/ .
20 40 60 80 100050100150200250300
Number of workers n A v e r a g ec o m p l e ti on ti m e Staircase codesPRACGC3PC3P (b) Task completion time as a function of the number of workerswith z = 13 . Fig. 2: Comparison between PRAC, Staircase codes and GC3P in scenario 1 for different values of the number workers andnumber of colluding workers. We fix the number of rows to m = 1000 . For each value of the x -axis we run experimentsand average the results. We observe that the difference between the completion time of PRAC and this of GC3P is large forsmall values of n − z and decreases with the increase of n − z .workers to and the number of colluding workers to andplot the completion time for PRAC, C3P, GC3P and Staircasecodes. Notice that PRAC and Staircase codes have closecompletion time in scenario 1 (Figure 3a) and this completiontime is far from that of C3P. The reason is that in this scenariowe pick exactly workers to be fast ( λ i = 9 ) and the othersto be significantly slower. Since PRAC assigns keys to thefastest z workers, the completion time is dictated by the slowworkers. To compare PRAC with Staircase codes notice thatthe majority of the remaining workers have λ i = 3 thereforepre-allocating equal tasks to the workers is close to adaptivelyallocating the tasks.In terms of lower bound on PRAC, observe that whenthe fastest workers are assumed to be adversarial, GC3P andPRAC have very similar task completion time. However, whenthe slowest workers are assumed to be adversarial the comple-tion of GC3P is very close to C3P and far from PRAC. This observation is in accordance with Lemma 2. In scenarios 2 and3 we pick the adversarial workers uniformly at random andobserve that the completion time of PRAC becomes closer toGC3P when the workers are more heterogeneous. For instance,in scenario 3, GC3P and PRAC have closer performance whenthe workers’ computing times are chosen uniformly at randomfrom the interval [0 . , .In Figure 2, we plot the task completion time as a functionof the number of workers n for a fixed number of rows m =1000 and λ i ’s assigned according to scenario 1. In Figure 2(a),we change the number of workers from to and keepthe ratio z/n = 1 / fixed. We notice that with the increase of n the completion time of PRAC becomes closer to GC3P. InFigure 2(b), we change the number of workers from to and keep z = 13 fixed. We notice that with the increase of n ,the effect of the eavesdropper is amortized and the completiontime of PRAC becomes closer to C3P. In this setting, PRAC Number of colluding workers z A v e r a g ec o m p l e ti on ti m e Staircase codesPRACGC3PC3P (a) Task completion time as a function of the number ofcolluding workers for n = 50 . Computing time of the workersare chosen according to scenario 1. Number of colluding workers z A v e r a g ec o m p l e ti on ti m e PRACStaircase codesGC3PC3P (b) Task completion time for n = 50 workers and variable z .Computing times of the workers are chosen such that the n − z slowest workers are homogeneous. Fig. 3: Comparison between PRAC and Staircase codes average completion time as a function of number of colluding workers z . We fix the number of rows to m = 1000 . Both codes are affected by the increase of number of colluding helpers becausetheir runtime is restricted to the slowest n − z workers. We observe that PRAC outperforms Staircase codes except when the n − z slowest workers are homogeneous.always outperforms Staircase codes.In Figure 3, we plot the task completion time as a functionof the number of colluding workers. In Figure 3(a), we choosethe computing time at the workers according to scenario 1. Wechange z from to and observe that the completion timeof PRAC deviates from that of GC3P with the increase of z . More importantly, we observe two inflection points of theaverage completion time of PRAC at z = 13 and z = 37 .Those inflection points are due to the fact that we have fast workers ( λ = 9 ) and workers with medium speed( λ = 3 ) in the system. For z > , the completion time ofStaircase codes becomes less than that of PRAC because the slowest workers are homogeneous. Therefore, pre-allocatingthe tasks is better than using Fountain codes and paying forthe overhead of computations. To confirm that Staircase codesalways outperforms PRAC when the slowest n − z workersare homogeneous, we run a simulation in which we divide theworkers into three clusters. The first cluster consists of ⌊ z/ ⌋ fast workers ( λ = 9 ), the second consists of ⌊ z/ ⌋ +1 workersthat are regular ( λ = 3 ) and the remaining n − z workers areslow ( λ = 1 ). In Figure 3(b) we fix n to and change z from to . We observe that Staircase codes always outperformPRAC in this setting. In contrast to non secure C3P, Staircasecodes and PRAC are always restricted to the slowest n − z workers and cannot leverage the increase of the number offast workers. For GC3P, we assume that the fastest workersare eavesdroppers. We note that as expected from Lemma 2,when the fastest workers are assumed to be eavesdroppers theperformance of GC3P and PRAC becomes very close. B. ExperimentsSetup.
The master device is a Nexus 5 Android-basedsmartphone running 6.0.1. The worker devices are Nexus6Ps running Android 8.1.0. The master device connects toworker devices via Wi-Fi Direct links and the master is the group owner of Wi-Fi Direct group. The master device isrequired to complete one matrix multiplication ( y = Ax )where A is of dimensions × and x is a × vector. We also take m = b i.e., each packet is a row of A . We introduced an artificial delay at the workers followingan exponential distribution. The introduced delays serves toemulate applications running in the background of the devices.A worker device sends the result to the master after it is donecalculating and the introduced delay has passed. Furthermore,we assume that z = 1 i.e., there is one unknown workerthat is adversarial among all the workers. The experiments areconducted in a lab environment where there are other Wi-Finetworks operating in the background. Baselines.
Our PRAC algorithm is compared to three base-line algorithms: (i) Staircase codes that preallocate the tasksbased on n , the number of workers, k , the minimum numberof workers required to reconstruct the information, and z , thenumber of colluding workers; (ii) GC3P in which we assumethe adversarial worker is known and excluded during the taskallocation; (iii) Non secure C3P in which the security problemis ignored and the master device will utilize every resourcewithout randomness. In this setup we run C3P on n − z workers. Results.
Figure 4 presents the task completion time withincreasing number of workers for the homogeneous setup, i.e.,when all the workers have similar computing times. Comput-ing delay for each packet follows an exponential distributionwith mean µ = 1 /λ = 3 seconds in all workers. C3P performsthe best in terms of completion time, but C3P does not provideany privacy guarantees. PRAC outperforms Staircase codeswhen the number of workers is 5. The reason is that PRAC per-forms better than Staircase codes in heterogeneous setup, andwhen the number of workers increases, the system becomesa bit more heterogeneous. GC3P significantly outperformsPRAC in terms of completion time. Yet, it requires a prior Number of Workers C o m p l e t i on t i m e ( s ) StaircasePRACGC3PC3P
Fig. 4: Completion time as function of the number of workersin homogeneous setup.
Number of Workers C o m p l e t i on t i m e ( s ) StaircasePRACGC3PC3P (a) We assume a fast worker is adversarial for GC3P.
Number of Workers C o m p l e t i on t i m e ( s ) StaircasePRACGC3PC3P (b) We assume a slow worker is adversarial for GC3P.
Fig. 5: Completion time as function of the number of workersin heterogeneous setup.knowledge of which worker is adversarial, which is often notavailable in real world scenarios.Now, we focus on heterogeneous setup. We group theworkers into two groups; fast workers (per task delay followsexponential delay with mean seconds) and slow workers(per task delay follows exponential distribution with mean seconds). Figure 5 presents the completion time as a function Number of Workers C o m p l e t i on t i m e ( s ) StaircasePRACGC3PC3P (a) We assume a fast worker is adversarial for GC3P.
Number of Workers C o m p l e t i on t i m e ( s ) StaircasePRACGC3PC3P (b) We assume a slow worker is adversarial for GC3P.
Fig. 6: Completion time as function of the number of workersin heterogeneous setup.of number of workers. In this setup, for the n -worker scenario,there are (cid:6) n (cid:7) fast and (cid:4) n (cid:5) slow workers. The differencebetween the setups of Figure 5(a) and Figure 5(b) is thatwe remove a fast worker (as adversarial) for GC3P in theformer, whereas in the latter, we assume that the eavesdropperis a slow worker. As illustrated in Figure 5, for the 2-workercase, due to the overhead introduced by Fountain codes,PRAC performs worse than Staircase code. However, PRACoutperforms Staircase codes in terms of completion time for3, 4, and 5 worker cases. This is due to the fact that PRACcan utilize results calculated by slow workers more effectivelywhen the number of workers is large. On the other hand,the results computed by slow workers are often discarded inStaircase codes, which is a waste of computation resources.If a fast worker is removed as adversarial for GC3P, thedifference between the performance of GC3P and PRACbecomes smaller. This result is intuitive as, in PRAC, themaster has to wait for the ( z + 1) st fastest worker to decode A x , which is also the case for GC3P in this setting.In Figure 6, we consider the same setup with the exceptionthat for the n -worker scenario, there are (cid:6) n (cid:7) slow and (cid:4) n (cid:5) fastworkers. Staircase codes perform more closely to PRAC in the3-worker case as compared to Figure 5 since the setup of Fig.6assumes that the n-z=2 slowest workers are homogeneous, whereas in Fig.5 the n-z=2 slowest workers are heterogeneous.Yet, for 5-worker case, PRAC outperforms Staircase codeswhen comparing to Figure 5 since PRAC is adaptive to time-varying resources while Staircase codes assigns tasks a prioriin a static manner.Note that in all experiments when n − z slowest workers arehomogeneous Staircase codes outperform GC3P and PRAC.This happens because pre-allocating the tasks to the workersavoids the overhead of sub-tasks required by Fountain codesand utilizes all the workers to their fullest capacity.VI. R ELATED WORK
Mobile cloud computing is a rapidly growing field with theaim of providing better experience of quality and extensivecomputing resources to mobile devices [26], [27]. The mainsolution to mobile computing is to offload tasks to the cloudor to neighboring devices by exploiting connectivity of thedevices. With task offloading come several challenges suchas heterogeneity of the devices, time varying communicationchannels and energy efficiency, see e.g., [28]–[31]. We referinterested reader to [2] and references within for a detailedliterature on edge computing and mobile cloud computing.The problem of stragglers in distributed systems is initiallystudied by the distributed computing community, see e.g.,[32]–[35]. Research interest in using coding theoretical tech-niques for straggler mitigation in distributed content downloadand distributed computing is rapidly growing. The early bodyof work focused on content download, see e.g., [36]–[40].Using codes for straggler mitigation in distributed computingstarted in [12] where the authors proposed the use of MDScodes for distributed linear machine learning algorithms inhomogeneous workers setting.Following the work of [12], coding schemes for stragglermitigation in distributed matrix-matrix multiplication, codedcomputing and machine learning algorithms are introducedand the fundamental limits between the computation load andthe communication cost are studied, see e.g., [8], [41] andreferences within for matrix-matrix multiplication, see [4], [7],[10]–[13], [42]–[49] for machine learning algorithms and [5],[6], [9], [50] and references within for other topics.Codes for privacy and straggler mitigation in distributedcomputing are first introduced in [3] where the authors con-sider a homogeneous setting and focus on matrix-vector mul-tiplication. Beyond matrix-vector multiplication, the problemof private distributed matrix-matrix multiplication and privatepolynomial computation with straggler tolerance is studied[51]–[56]. The former works are designed for the homoge-neous static setting in which the master has a prior knowledgeon the computation capacities of the workers and pre-assignsthe sub-tasks equally to them. In addition, the master setsa threshold on the number of stragglers that it can toleratethroughout the whole process. In contrast, PRAC is designedfor the heterogeneous dynamic setting in which workers havedifferent computation capacities that can change over time.PRAC assigns the sub-tasks to the workers in an adaptivemanner based on the estimated computation capacity of eachworker. Furthermore, PRAC can tolerate a varying number of stragglers as it uses an underlying rateless code, which givesthe master a higher flexibility in adaptively assigning the sub-tasks to the workers. Those properties of PRAC allow a betteruse of the workers over the whole process. On the other hand,PRAC is restricted to matrix-vector multiplication. Althoughcoded computation is designed for linear operations, thereis a recent effort to apply coded computation for non-linearoperations. For example, [57] applied coded computation tologistic regression, and the framework of Gradient codingstarted in [10] generalizes to any gradient-descent algorithm.Our work is complementary with these works. For example,our work can be directly used as complementary to [57]to provide privacy and adaptive task offloading to logisticregression.Secure multi-party communication (SMPC) [58] can berelated to our work as follows. The setting of secure multi-party computing schemes assumes the presence of severalparties (masters in our terminology) who want to computea function of all the data owned by the different partieswithout revealing any information about the individual dataof each party. This setting is a generalized version of themaster/worker setting that we consider. More precisely, anSMPC scheme reduces to our Master/worker setting if weassume that only one party owns data and the others have nodata to include in the function to be computed. SMPC schemesuse threshold secret sharing schemes, therefore they restrictthe master to a fixed number of stragglers. Thus, showingthat PRAC outperforms Staircase codes (which are the bestknown family of threshold secret sharing schemes) implies thatPRAC outperform the use of SMPC schemes that are reducedto this setting. Works on privacy-preserving machine learningalgorithms are also related to our work. However, the privacyconstraint in this line of work is computational privacy andthe proposed solutions do not take stragglers into account, seee.g., [59]–[61].We restrict the scope of this paper to eavesdropping at-tacks, which are important on their own merit. Privacy andsecurity can be achieved by using Maximum Distance Sepa-rable (MDS)-like codes which restrict the master to a fixedmaximum number of stragglers [52], [55]. Our solution onthe other hand addresses the privacy problem in an adaptivecoded computation setup without such a restriction. In thissetup, security cannot be addressed by expanding the resultsof [52], [55]. In fact, we developed a secure adaptive codedcomputation mechanism in our recent paper [62] againstByzantine attacks. The private and secure adaptive codedcomputation obtained by combining this paper and [62] is outof scope of this paper.VII. C
ONCLUSION
The focus of this paper is to develop a secure edge com-puting mechanism to mitigate the computational bottleneckof IoT devices by allowing these devices to help each otherin their computations, with possible help from the cloud ifavailable. Our key tool is the theory of coded computation,which advocates mixing data in computationally intensivetasks by employing erasure codes and offloading these tasks to other devices for computation. Focusing on eavesdroppingattacks, we designed a private and rateless adaptive codedcomputation (PRAC) mechanism considering (i) the privacyrequirements of IoT applications and devices, and (ii) theheterogeneous and time-varying resources of edge devices.Our proposed PRAC model can provide adequate securityand latency guarantees to support real-time computation atthe edge. We showed through analysis, MATLAB simulations,and experiments on Android-based smartphones that PRACoutperforms known secure coded computing methods whenresources are heterogeneous.VIII. A CKNOWLEDGEMENT
This work was supported in parts by the Army ResearchLab (ARL) under Grant W911NF-1820181, National ScienceFoundation (NSF) under Grants CNS-1801708 and CNS-1801630, and the National Institute of Standards and Tech-nology (NIST) under Grant 70NANB17H188.R
EFERENCES[1] R. Bitar, Y. Xing, Y. Keshtkarjahromi, V. Dasari, S. El Rouayheb, andH. Seferoglu, “PRAC: Private and Rateless Adaptive Coded Computa-tion at the Edge,” in
SPIE - Defense + Commercial Sensing , vol. 11013,2019.[2] Y. Keshtkarjahromi, Y. Xing, and H. Seferoglu, “Dynamic heterogeneity-aware coded cooperative computation at the edge,” in , Sept 2018.[3] R. Bitar, P. Parag, and S. El Rouayheb, “Minimizing latency forsecure distributed computing,” in
Information Theory (ISIT), 2017 IEEEInternational Symposium on , pp. 2900–2904, IEEE, 2017.[4] S. Li, M. A. Maddah-Ali, and A. S. Avestimehr, “A unified codingframework for distributed computing with straggling servers,” in
Globe-com Workshops (GC Wkshps), 2016 IEEE , pp. 1–6, IEEE, 2016.[5] S. Dutta, V. Cadambe, and P. Grover, “Coded convolution for par-allel and distributed computing within a deadline,” arXiv preprintarXiv:1705.03875 , 2017.[6] Y. Yang, P. Grover, and S. Kar, “Computing linear transformationswith unreliable components,”
IEEE Transactions on Information Theory ,2017.[7] W. Halbawi, N. Azizan-Ruhi, F. Salehi, and B. Hassibi, “Improvingdistributed gradient descent using reed-solomon codes,” arXiv preprintarXiv:1706.05436 , 2017.[8] Q. Yu, M. Maddah-Ali, and S. Avestimehr, “Polynomial codes: anoptimal design for high-dimensional coded matrix multiplication,” in
Advances in Neural Information Processing Systems , pp. 4403–4413,2017.[9] S. Dutta, V. Cadambe, and P. Grover, “Short-dot: Computing largelinear transforms distributedly using coded short dot products,” in thAnnual Conference on Neural Information Processing Systems (NIPS) ,pp. 2092–2100, 2016.[10] R. Tandon, Q. Lei, A. G. Dimakis, and N. Karampatziakis, “Gradientcoding: Avoiding stragglers in distributed learning,” in InternationalConference on Machine Learning , pp. 3368–3376, 2017.[11] S. Li, M. A. Maddah-Ali, and A. S. Avestimehr, “Fundamental tradeoffbetween computation and communication in distributed computing,” in
IEEE International Symposium on Information Theory (ISIT) , 2016.[12] K. Lee, M. Lam, R. Pedarsani, D. Papailiopoulos, and K. Ramchandran,“Speeding up distributed machine learning using codes,”
IEEE Trans-actions on Information Theory , vol. 64, no. 3, pp. 1514–1529, 2018.[13] C. Karakus, Y. Sun, S. Diggavi, and W. Yin, “Straggler mitigation indistributed optimization through data encoding,” in
Advances in NeuralInformation Processing Systems , pp. 5434–5442, 2017.[14] M. F. Aktas, P. Peng, and E. Soljanin, “Effective straggler mitigation:Which clones should attack and when?,”
ACM SIGMETRICS Perfor-mance Evaluation Review , vol. 45, no. 2, pp. 12–14, 2017.[15] S. N. Shirazi, A. Gouglidis, A. Farshad, and D. Hutchison, “Theextended cloud: Review and analysis of mobile edge computing and fogfrom a security and resilience perspective,”
IEEE Journal on SelectedAreas in Communications , vol. 35, no. 11, pp. 2586–2595, 2017. [16] N. Abbas, Y. Zhang, A. Taherkordi, and T. Skeie, “Mobile edgecomputing: A survey,”
IEEE Internet of Things Journal , vol. 5, no. 1,pp. 450–465, 2017.[17] R. Roman, J. Lopez, and M. Mambo, “Mobile edge computing, fog etal.: A survey and analysis of security threats and challenges,”
FutureGeneration Computer Systems , vol. 78, pp. 680–698, 2018.[18] M. Luby, “Lt codes,” in null , p. 271, IEEE, 2002.[19] A. Shokrollahi, “Raptor codes,”
IEEE/ACM Transactions on Networking(TON) , vol. 14, no. SI, pp. 2551–2567, 2006.[20] D. J. MacKay, “Fountain codes,”
IEE Proceedings-Communications ,vol. 152, no. 6, pp. 1062–1068, 2005.[21] S. Lin and D. Costello,
Error-Correcting Codes . Prentice-Hall, Inc,1983.[22] F. J. MacWilliams and N. J. A. Sloane,
The theory of error-correctingcodes . Elsevier, 1977.[23] G. A. Seber and A. J. Lee,
Linear regression analysis , vol. 329. JohnWiley & Sons, 2012.[24] J. A. Suykens and J. Vandewalle, “Least squares support vector machineclassifiers,”
Neural processing letters , vol. 9, no. 3, pp. 293–300, 1999.[25] J. Lacan, V. Roca, J. Peltotalo, and S. Peltotalo, “Reed-solomon forwarderror correction (fec) schemes,” tech. rep., 2009.[26] H. T. Dinh, C. Lee, D. Niyato, and P. Wang, “A survey of mobilecloud computing: architecture, applications, and approaches,”
Wirelesscommunications and mobile computing , vol. 13, no. 18, pp. 1587–1611,2013.[27] N. Fernando, S. W. Loke, and W. Rahayu, “Mobile cloud computing:A survey,”
Future Generation Computer Systems , vol. 29, no. 1, pp. 84– 106, 2013.[28] Z. Sanaei, S. Abolfazli, A. Gani, and R. Buyya, “Heterogeneity in mobilecloud computing: Taxonomy and open challenges,”
IEEE Communica-tions Surveys Tutorials , vol. 16, no. 1, pp. 369–392, 2014.[29] Y. Geng, W. Hu, Y. Yang, W. Gao, and G. Cao, “Energy-efficientcomputation offloading in cellular networks,” in
IEEE InternationalConference on Network Protocols (ICNP) , 2015.[30] R. K. Lomotey and R. Deters, “Architectural designs from mobile cloudcomputing to ubiquitous cloud computing - survey,” in
IEEE WorldCongress on Services , 2014.[31] T. Penner, A. Johnson, B. V. Slyke, M. Guirguis, and Q. Gu, “Transientclouds: Assignment and collaborative execution of tasks on mobiledevices,” in
IEEE Global Communications Conference , 2014.[32] J. Dean and S. Ghemawat, “Mapreduce: simplified data processing onlarge clusters,”
Communications of the ACM , vol. 51, no. 1, pp. 107–113,2008.[33] J. Dean and L. A. Barroso, “The tail at scale,”
Communications of theACM , vol. 56, no. 2, pp. 74–80, 2013.[34] B. Recht, C. Re, S. Wright, and F. Niu, “Hogwild: A lock-free approachto parallelizing stochastic gradient descent,” in
Advances in NeuralInformation Processing Systems , pp. 693–701, 2011.[35] J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, M. Mao, A. Senior,P. Tucker, K. Yang, Q. V. Le, et al. , “Large scale distributed deep net-works,” in
Advances in neural information processing systems , pp. 1223–1231, 2012.[36] L. Huang, S. Pawar, H. Zhang, and K. Ramchandran, “Codes can reducequeueing delay in data centers,” in
IEEE International Symposium onInformation Theory (ISIT) , 2012.[37] G. Joshi, Y. Liu, and E. Soljanin, “Coding for fast content download,”in , 2012.[38] G. Liang and U. C. Kozat, “TOFEC: Achieving optimal throughput-delay trade-off of cloud storage using erasure codes,” in
IEEE Interna-tional Conference on Computer Communications , 2014.[39] S. Kadhe, E. Soljanin, and A. Sprintson, “Analyzing the download timeof availability codes,” in
IEEE International Symposium on InformationTheory (ISIT) , 2015.[40] P. Peng and E. Soljanin, “On distributed storage allocations of large filesfor maximum service rate,” arXiv preprint, arXiv:1808.07545 , 2018.[41] Q. Yu, M. A. Maddah-Ali, and A. S. Avestimehr, “Straggler mitigationin distributed matrix multiplication: Fundamental limits and optimalcoding,” arXiv preprint, arXiv:1801.07487 , 2018.[42] R. K. Maity, A. S. Rawat, and A. Mazumdar, “Robust gradi-ent descent via moment encoding with ldpc codes,” arXiv preprintarXiv:1805.08327 , 2018.[43] L. Chen, Z. Charles, D. Papailiopoulos, et al. , “Draco: Robust distributedtraining via redundant gradients,” arXiv preprint arXiv:1803.09877 ,2018. [44] S. Kiani, N. Ferdinand, and S. C. Draper, “Exploitation of stragglers incoded computation,” in IEEE International Symposium on InformationTheory (ISIT) , pp. 1988–1992, IEEE, 2018.[45] E. Ozfaturay, D. Gunduz, and S. Ulukus, “Speeding up distributedgradient descent by utilizing non-persistent stragglers,” arXiv preprintarXiv:1808.02240 , 2018.[46] M. Ye and E. Abbe, “Communication-computation efficient gradientcoding,” arXiv preprint arXiv:1802.03475 , 2018.[47] N. Ferdinand and S. Draper, “Anytime stochastic gradient descent: Atime to hear from all the workers,” arXiv preprint arXiv:1810.02976 ,2018.[48] S. Li, S. M. M. Kalan, A. S. Avestimehr, and M. Soltanolkotabi, “Near-optimal straggler mitigation for distributed gradient methods,” in , pp. 857–866, IEEE, 2018.[49] N. Raviv, I. Tamo, R. Tandon, and A. G. Dimakis, “Gradient cod-ing from cyclic mds codes and expander graphs,” arXiv preprintarXiv:1707.03858 , 2017.[50] U. Sheth, S. Dutta, M. Chaudhari, H. Jeong, Y. Yang, J. Kohonen,T. Roos, and P. Grover, “An application of storage-optimal matdot codesfor coded matrix multiplication: Fast k-nearest neighbors estimation,” arXiv preprint, arXiv:1811.11811 , 2018.[51] R. G. D’Oliveira, S. E. Rouayheb, and D. Karpuk, “Gaspcodes for secure distributed matrix multiplication,” arXiv preprintarXiv:1812.09962 , 2018.[52] Q. Yu, S. Li, N. Raviv, S. M. M. Kalan, M. Soltanolkotabi, and S. Aves-timehr, “Lagrange coded computing: Optimal design for resiliency,security and privacy,” arXiv preprint arXiv:1806.00939v4 , 2019.[53] W.-T. Chang and R. Tandon, “On the capacity of secure distributed ma-trix multiplication,” in , pp. 1–6, IEEE, 2018.[54] J. Kakar, S. Ebadifar, and A. Sezgin, “Rate-efficiency and straggler-robustness through partition in distributed two-sided secure matrixcomputation,” arXiv preprint arXiv:1810.13006 , 2018.[55] H. Yang and J. Lee, “Secure distributed computing with stragglingservers using polynomial codes,”
IEEE Transactions on InformationForensics and Security , vol. 14, no. 1, pp. 141–150, 2018.[56] Q. Yu, N. Raviv, J. So, and A. S. Avestimehr, “Lagrange codedcomputing: Optimal design for resiliency, security and privacy,” arXivpreprint, arXiv:1806.00939 , 2018.[57] J. So, B. Guler, A. S. Avestimehr, and P. Mohassel, “CodedPrivateML: Afast and privacy-preserving framework for distributed machine learning,” arXiv preprint arXiv:1902.00641 , 2019.[58] R. Cramer, I. B. Damgrd, and J. B. Nielsen,
Secure Multiparty Compu-tation and Secret Sharing . New York, NY, USA: Cambridge UniversityPress, 1st ed., 2015.[59] H. Takabi, E. Hesamifard, and M. Ghasemi, “Privacy preserving multi-party machine learning with homomorphic encryption,” in th AnnualConference on Neural Information Processing Systems (NIPS) , 2016.[60] R. Hall, S. E. Fienberg, and Y. Nardi, “Secure multiple linear regres-sion based on homomorphic encryption,” Journal of Official Statistics ,vol. 27, no. 4, p. 669, 2011.[61] S. Gade and N. H. Vaidya, “Private learning on networks: Part ii,” arXivpreprint arXiv:1703.09185 , 2017.[62] Y. Keshtkarjahromi, R. Bitar, V. Dasari, S. E. Rouayheb, and H. Sefer-oglu, “Secure coded cooperative computation at the heterogeneous edgeagainst byzantine attacks,” in
IEEE Global Communication Conference(GLOBECOM) , 2019.[63] N. B. Shah, K. V. Rashmi, and P. V. Kumar, “Information-theoreticallysecure regenerating codes for distributed storage,” in
Proc. IEEE GlobalCommunications Conference , 2011.[64] S. E. Rouayheb, E. Soljanin, and A. Sprintson, “Secure network codingfor wiretap networks of type II,”
IEEE Transactions on InformationTheory , vol. 58, pp. 1361–1371, March 2012.[65] R. Bitar and S. El Rouayheb, “Staircase codes for secret sharing withoptimal communication and read overheads,”
IEEE Transactions onInformation Theory , vol. PP, no. 99, pp. 1–1, 2017.[66] Y. Keshtkarjahromi, Y. Xing, and H. Seferoglu, “Dynamic heterogeneity-aware coded cooperative computation at the edge,” arXiv preprint,rXiv:1801.04357v3 , 2018. A PPENDIX AH IDING THE V ECTOR x In machine learning applications, the master runs iterativealgorithms in which the vector x contains information about A and needs to be hidden from the workers. We describe howPRAC can be generalized to achieve privacy for both A and x . The idea is to divide the n workers into two disjoint groupsand ask each of them to privately multiply A by a vector thatis statistically independent of x . In addition, the master shouldbe able to decode A x from the results of both multiplications.The scheme works as follows. The master divides the workersinto two groups of cardinality n and n such that n + n = n and chooses the security parameters z < n and z < n .To hide x , the master generates a random vector u of samesize as x and sends x + u to the first group and u to thesecond group. Afterwards, the master applies PRAC on bothgroups. According to our scheme, the master decodes A ( x + u ) and A u after receiving enough responses from the workers ofeach group. Hence, the master can decode A x . Note that noinformation about x is revealed because it is one-time paddedby u . Note that here we assume workers from group 1 donot collude with workers from group 2. The same idea can begeneralized to the case where workers from different groupscan collude by creating more groups and encoding x using anappropriate secret sharing scheme. For instance, if the masterdivides the workers into groups and workers from any different groups can collude, the master encodes x into u , u and u + u + x and sends each vector to a differentgroup. A PPENDIX BE XTENSION OF PROOF OF PRIVACY ( I . E ., T HEOREM t of PRAC. Let P i denote the random variablerepresenting packet p i sent to worker w i . For any subset Z ⊂{ , . . . , n } , | Z | = z , denote by P Z the collection of packetsindexed by Z , i.e., P Z = { p i ; i ∈ Z } . We prove that theperfect secrecy constraint H ( A | P Z ) = H ( A ) , given in (2),is equivalent to H ( K | P Z , A ) = 0 . The proof is standard[63]–[65] but we reproduce it here for completeness. In whatfollows, the logarithms in the entropy function are taken base q , where q is a power of prime for which all matrices can bedefined in a finite field F q . We can write, H ( A | P Z ) = H ( A ) − H ( P Z ) + H ( P Z | A ) (8) = H ( A ) − H ( P Z ) + H ( P Z | A ) − H ( P Z | A, K ) (9) = H ( A ) − H ( P Z ) + I ( P Z ; K | A ) (10) = H ( A ) − H ( P Z ) + H ( K | A ) − H ( K | P Z , A ) (11) = H ( A ) − H ( P Z ) + H ( K ) − H ( K | P Z , A ) (12) = H ( A ) − z + z − H ( K | P Z , A ) (13) = H ( A ) − H ( K | P Z , A ) . (14)Equation (9) follows from the fact that given the data A and the keys R , . . . , R z all packets generated by the mastercan be decoded, in particular the packets P Z received by any z workers can be decoded, i.e., H ( P Z | A, K ) = 0 .Equation (12) follows because the random matrices are chosenindependently from the data matrix A and equation (13) fol-lows because PRAC uses z independent random matrices thatare chosen uniformly at random from the field F q . Therefore,proving that H ( A | P Z ) = H ( A ) is equivalent to proving that H ( K | P Z , A ) = 0 . In other words, we need to prove thatthe random matrices can be decoded given the collection ofpackets sent to any z workers and the data matrix A . This isthe main reason behind encoding the random matrices using an ( n, z ) MDS code. We formally prove that H ( K | P Z , A ) = 0 in the proof of Theorem 1. Note from equation (12) that forany code to be information theoretically private, H ( K ) cannotbe less then H ( P Z ) = z . This means that a secure code mustuse at least z independent random matrices.A PPENDIX CP ROOF OF L EMMA z colluding workers andallows the master to decode A x using the help of the fastest z workers. Without loss of generality, assume that the workersare ordered from the fastest to the slowest, i.e., worker w is the fastest and worker w n is the slowest. The previousassumption implies that the results sent from the first z workerscontain information about A x , otherwise the master wouldhave to wait at least for the ( z + 1) st fastest worker todecode A x . By linearity of the multiplication A x , decodinginformation about A x from the results of z workers impliesdecoding information about A from the packets sent to those z workers. Hence, there exists a set of z workers for which H ( S | P Z ) = 0 , where P Z denotes the tasks allocated to asubset Z ⊂ { , . . . , n } of z workers, hence violating theprivacy constraint. Therefore, any private coded computingscheme for linear computation limits the master to the speedof the ( z + 1) st fastest worker in order to decode the wantedresult. A PPENDIX DP ROOF OF T HEOREM τ i computed packets fromworker w i is equal to T i ≈ RT T i + τ i E [ β t,i ] ≈ τ i E [ β t,i ] where RT T i is the average transmission delay for sending onepacket to worker w i and receiving one computed packet fromthe worker, β t,i is the computation time spent on multiplyingpacket p t,i by x at worker w i , and the average E [ β t,i ] istaken over all τ i packets. The reason is that PRAC is adynamic algorithm that sends packets to each worker w i withthe interval of E [ β t,i ] between each two consecutive packetsand it utilizes the resources of workers fully [66]. The reasonbehind counting only one round-trip time (RTT) in T i is thatin PRAC, the packets are being transmitted to the workerswhile the previously transmitted packets are being computedat the worker. Therefore, in the overall delay only one RT T i is required for sending the first packet p ,i to worker w i andreceiving the last computed packet p τ i ,i x at the master. Toapproximate the total delay, we assume that the transmissiondelay of one packet is negligible compared to the computingdelay of all τ i packets, which is a valid assumption in practicefor IoT-devices at the edge.On the other hand, in PRAC, the master stops sendingpackets to workers as soon as it collectively receives b + ǫ computed packets from the n − z slowest workers (notethat b + ǫ is the number of computed packets required forsuccessful decoding, where ǫ is the overhead due to FountainCoding), i.e., P ni = z +1 τ i = b + ǫ . Note that the z fastestworkers are assigned for computing the keys as describedin the previous sections. Due to efficiently using the re-sources of workers by PRAC, all n − z workers will finishcomputing τ i packets approximately at the same time, i.e., T P RAC ≈ T i ≈ τ i E [ β t,i ] , i = z + 1 , ..., n . By replacing τ i with T PRAC E [ β t,i ] in P ni = z +1 τ i = b + ǫ , we can show that T P RAC ≈ b + ǫ P ni = z +1 / E [ β t,i ] . Note that the approximated valueapproaches the exact value by increasing b . The reason is thatthe workers’ efficiency increases with increasing b .A PPENDIX EP ROOF OF T HEOREM E [ T SC ] as a function of the computing time β t,i of worker w i , i = 1 , . . . , n , as E [ T SC ] = min d ∈{ k,...,n } (cid:26) k − zd − z E [ T ( d ) ] (cid:27) (15) = min d ∈{ k,...,n } (cid:26) bd − z E [ β t,d ] (cid:27) , (16)where w d is the d th fastest worker. Next, we find a lower boundon E [ T SC ] − E [ T P RAC ] as follows E [ T SC ] − E [ T P RAC ] = b d − z E [ β t,d ] − b + ǫ P ni = z +1 1 E [ β t,i ] (17) = b d − z E [ β t,d ] − b + ǫ P di = z +1 1 E [ β t,i ] + P ni = d +1 1 E [ β t,i ] (18) ≥ b d − z E [ β t,d ] − b + ǫ ( d − z ) E [ β t,d ] + ( n − d ) E [ β t,n ] (19) = b ( n − d ) E [ β t,n ] − ǫ ( d − z ) E [ β t,d ] d − z E [ β t,d ] ( d − z E [ β t,d ] + n − d E [ β t,n ] ) (20) = bx − ǫyy ( x + y ) , (21)where x = n − d E [ β t,n ] and y = d − z E [ β t,d ] and the inequality (19)comes from the fact that z ≤ k ≤ d ≤ nn