Improving IoT Analytics through Selective Edge Execution
A. Galanopoulos, A. G. Tasiopoulos, G. Iosifidis, T. Salonidis, D. J. Leith
aa r X i v : . [ c s . N I] M a r Improving IoT Analytics through Selective EdgeExecution
Apostolos Galanopoulos ∗ , Argyrios G. Tasiopoulos † , George Iosifidis ∗ ,Theodoros Salonidis ‡ , Douglas J. Leith ∗∗ School of Computer Science and Statistics, Trinity College Dublin † Department of Electronic and Electrical Engineering, University College London ‡ IBM T. J. Watson Research Center, New York
Abstract —A large number of emerging IoT applications relyon machine learning routines for analyzing data. Executing suchtasks at the user devices improves response time and economizesnetwork resources. However, due to power and computing limi-tations, the devices often cannot support such resource-intensiveroutines and fail to accurately execute the analytics. In this work,we propose to improve the performance of analytics by leveragingedge infrastructure. We devise an algorithm that enables theIoT devices to execute their routines locally; and then outsourcethem to cloudlet servers, only if they predict they will gain asignificant performance improvement. It uses an approximatedual subgradient method, making minimal assumptions aboutthe statistical properties of the system’s parameters. Our analysisdemonstrates that our proposed algorithm can intelligentlyleverage the cloudlet, adapting to the service requirements.
Index Terms —Edge Computing, Network Optimization, Re-source Allocation, Data Analytics
I. I
NTRODUCTION
The recent demand for machine learning (ML) applications,such as image recognition, natural language translation, andhealth monitoring, has been unprecedented [1]. These servicescollect data streams generated by small devices, and analyzethem locally or at distant cloud servers. There is growingconsensus that such applications will be ubiquitous in Internetof Things (IoT) systems [2]. The challenge, however, withsuch services is that they are often resource intensive. On theone hand, the cloud offers powerful ML models and abundantcompute resources but requires data transfers which consumenetwork bandwidth and might induce significant delays [3].On the other hand, executing these services at the deviceseconomizes bandwidth but degrades their performance due tothe devices’ limited resources, e.g. memory or energy.A promising approach to tackle this problem is to allow thedevices to outsource individual ML tasks to edge infrastruc-ture such as cloudlets [4]. This can increase their executionaccuracy since the cloudlet’s ML components are typicallymore complex, and hence offer improved results. Nevertheless,the success of such solutions presumes intelligent outsourcingalgorithms. The cloudlets, unlike the cloud, have limitedcomputing capacity and cannot support all requests. At thesame time, task execution requires the transfer of large datavolumes (e.g., video streams). This calls for prudent transmis-sion decisions in order to avoid wasting device energy andbandwidth. Furthermore, unlike prior computation offloading solutions [5], it is crucial to only outsource the tasks that cansignificantly benefit from cloudlet execution.
Our goal is to design an online framework that addressesthe above issues and makes intelligent outsourcing decisions .We consider a system where a cloudlet improves the executionof image classification tasks running on devices such aswireless IoT cameras. We assume that each device has a ”low-precision” classifier while the cloudlet can execute the taskwith higher precision. The devices classify the received objectsupon arrival, and decide whether to transmit them to thecloudlet or not, to get a better classification result. Making thisdecision requires an assessment of the potential performancegains, which are measured in terms of accuracy improvements.To this end, we propose the usage of a predictor at each devicethat leverages the local classification results.We consider the practical case where the resources’ avail-ability is unknown and time-varying, but their instantaneousvalues are observable. We design a distributed adaptive algo-rithm that decides the task outsourcing policy towards max-imizing the long-term performance of analytics. To achievethis, we formulate the system’s operation as an optimizationproblem, which is decomposed via Lagrange relaxation to aset of device-specific problems. This enables its distributedsolution through an approximate – due to the unknown pa-rameters – dual ascent method, that can be applied in realtime. The method is inspired by primal averaging schemesfor static problems, e.g., see [6], and achieves a bounded andtunable optimality gap using a novel approximate iterationtechnique. Our contributions can be summarized as follows: • Edge Analytics . We study the novel problem of intelligentlyimproving data analytics tasks using edge infrastructure,which is increasingly important for the IoT. • Decision Framework . We propose an online task outsourc-ing algorithm that achieves near-optimal performance undervery general conditions (unknown, non i.i.d. statistics). Thisis a novel analytical result of independent value. • Implementation & Evaluation . The solution is evaluated ina wireless testbed using a ML application, several classifiersand datasets. We find that our algorithm increases theaccuracy (up to ) and reduces the energy (down to )compared to carefully selected benchmark policies.
Organization . Sec. II introduces the model and the prob-lem. Sec. III presents the algorithm and Sec. IV the system1mplementation, experiments and trace-driven simulations. Wediscuss related work in Sec. V and conclude in Sec. VI.Although the paper is completely self-sufficient, the interestedreader will find more results from the implementation of oursystem, as well as a more detailed version of the proof of ourmain analytical contribution in [7].II. M
ODEL AND P ROBLEM F ORMULATION
Classifiers . There is a set C of C disjoint object classesand a set N of N edge devices. We assume a time-slottedoperation where each device n receives at slot t a group ofobjects (or tasks) S nt to be classified, e.g., frames captured byits camera. We define S n ⊇ S nt , ∀ t as the set of objects thatcan arrive at n , and S = ∪{S n } n . Each device n is equippedwith a local classifier J n : S n → (cid:0) C n , d n ( s nt ) (cid:1) , which outputsthe inferred class of an object s nt and a normalized confidencevalue d n ( s nt ) ∈ [0 , for that inference . The cloudlet has aclassifier J : S → (cid:0) C , d ( s nt ) (cid:1) that can classify any object,and offers higher accuracy from all devices, i.e., d ( s nt ) ≥ d n ( s nt ) , ∀ n ∈ N .Let φ nt ∈ [0 , denote the accuracy improvement when thecloudlet classifier is used: φ nt ( s nt ) = d ( s nt ) − d n ( s nt ) , ∀ n ∈ N , s nt ∈ S nt . (1)Every device is also equipped with a predictor Q n that istrained with the outcomes of the local and cloudlet classifiers.This predictor can estimate the accuracy improvement offeredby the cloudlet for each object s nt ∈ S nt : Q n : (cid:0) J n ( s nt ) (cid:1) → (cid:0) ˆ φ nt , σ nt (cid:1) , (2)and, in general, this assessment might be inexact, ˆ φ nt ( s nt ) = φ nt ( s nt ) , and σ nt ∈ [0 , is the respective confidence value. Wireless System . The devices access the cloudlet throughhigh capacity cellular or Wi-Fi links. Each device n has an average power budget of B n Watts. Power is a key limitationhere because the devices might have a small energy budget dueto protocol-induced transmission constraints, or due to useraversion for energy spending. The cloudlet has an averageprocessing capacity of H cycles/sec which is shared by thedevices, and when the total load exceeds H , the task delayincreases and eventually renders the system non-responsive. We consider the realistic scenario where the parameters ofdevices and the cloudlet change over time in an unknown fash-ion . Namely, they are created by random processes { B nt } ∞ t =1 and { H t } ∞ t =1 , and our decision framework has access onlyto their instantaneous values in each slot. Unlike previousoptimization frameworks [9] that assume i.i.d., or Markovmodulated processes; here we only ask that these perturbationsare bounded in each slot, i.e. H t ≤ H max , B nt ≤ B max , ∀ t and their averages converge to some finite values which wedo need to know, i.e., lim t →∞ P tτ =1 B nt /t = B n , ∀ n , andsimilarly for { H t } ∞ t =1 . We also define B t = ( B nt , n ∈ N ) . The classifier might output only the class with the highest confidence, ora vector with the confidence for each class; our analysis holds for both cases. This can be a model-based or model-free solution, e.g., a regressor or aneural-network; our analysis and framework work for any of these solutions.In the implementation we used a mixed-effects regressor, see [8].
Fig. 1: Schematic of the basic notation and procedure followedby the system’s devices.When an object (say, image) is transmitted in slot t from de-vice n to the cloudlet, it consumes part of the device’s powerbudget B n . We assume that this cost, denoted o nt , followsa random process { o nt } ∞ t =1 that is uniformly upper-boundedand has well-defined mean values. Also, each transmittedobject requires a number of processing cycles in the cloudletwhich might also vary with time, e.g., due to the differenttype of the objects, and we assume it follows the randomprocess { h nt } ∞ t =1 , with lim t →∞ P tτ =1 h nt /t = h n . We define o t = ( o nt ≤ o max , n ∈ N ) , and h t = ( h nt ≤ h max , n ∈ N ) .Our model is very general as the (i) requests, (ii) power andcomputing cost per request, and (iii) resource availability, canbe arbitrarily time-varying, and with unknown statistics. Problem Formulation . The IoT devices wish to involve thecloudlet only when they confidently expect high classificationprecision gains. Otherwise, they will consume the cloudlet’scapacity and their own power without significant performancebenefits. Therefore, we make the outsourcing decision for eachobject s nt based on the weighted improvement gain : w nt ( s nt ) = ˆ φ nt − ρ n σ nt , ∀ n, t , (3)where ρ n ≥ is a risk aversion parameter set by thesystem designer or each user. For example, assuming normaldistribution for φ nt , we could set ρ n = 1 and use a thresholdrule of standard deviation. We use hereafter these mod-ified parameters w nt , ∀ n , and partition the interval of theirvalues [ − w , w ] ( w being the maximum) into subintervals I j , j = 1 , . . . , M such that ∪ Mj =1 I j = [ − w , w ] , ∀ i = j ; with w jn being the center point of I j . This quantization facilitatesthe implementation of our algorithm in a real system, andis without loss of generality since we can use very shortintervals. Finally, let λ jnt denote the number of objects withexpected gain w jn that device n has created in slot t . These arrivals are generated by an unknown process { λ jnt } ∞ t =1 , with lim T →∞ /T P Tt =1 λ jnt = λ jn , ∀ n, j . Our aim is to maximize the aggregate long-term analyticsperformance gains, for all objects and IoT devices.
This canbe formulated as a mathematical program. We define variables y jn ∈ [0 , , ∀ n, j which indicate the long term ratio of objectswith expected gain of w jn that are sent to the cloudlet (with y jn = 1 , when all objects of n in I j are sent), and formulatethe convex problem: Power budgets are also affected by the local classifier computations whichare made for every object and thus do not affect the offloading decisions. This cost can reflect, e.g., the impact of time-varying channel conditions. : maximize y jn ∈ [0 , M X j =1 N X n =1 w jn λ jn y jn , f ( y ) (4a) s.t. M X j =1 y jn λ jn o n ≤ B n , n ∈ N , (4b) M X j =1 N X n =1 y jn λ jn h n ≤ H, (4c)where y = ( y jn : ∀ n, j ) . Eq. (4b) constraints the average powerbudget of each device and (4c) bounds the cloudlet utilization.Clearly, based on the specifics of each system we can addmore constraints, e.g., for the average wireless link capacity incase bandwidth is also a bottleneck resource. Such extensionsare straightforward as they do not change the properties of theproblem, nor affect our analysis below.The solution of P is a policy y ∗ that maximizes theaggregate (hence also average) analytics performance in thesystem. Such policies can be randomized, with y j ∗ n denotingthe probability of sending each object of n in interval I j tothe cloudlet (at each slot). However, in reality, the systemparameters not only change with time, but are generated byprocesses that might not be i.i.d. and have unknown statistics(mean values, etc.). This means that in practice we cannotfind y ∗ . In the next section we present an online policy thatis oblivious to the statistics of { λ t } , { o t } , { h t } , { H t } , { B t } but achieves indeed the same performance with y ∗ .III. O NLINE O FFLOADING A LGORITHM
Our solution approach is simple and, we believe, elegant.We replace the unknown parameters H , λ jn , B n , o n and h n , ∀ n, j in P with their running averages (which we calculate asthe system operates), solve the modified problem with gradientascent in the dual space, and perform primal averaging. Thisgives us an online policy that applies in real time the solution y t , ∀ t, while using only information made available by slot t . A. Problem Decomposition & Algorithm Design
Let us first define the running-average function: ¯ f t ( y ) , M X j =1 N X n =1 w jn y jn ¯ λ jnt = M X j =1 N X n =1 w jn λ jn y jn + M X j =1 N X n =1 w jn y jn ( λ jn − ¯ λ jnt )= f ( y ) + y ⊤ ǫ t , where ¯ λ jnt = P tτ =1 λ jnτ /t is the running average of λ jn , and ǫ t = (cid:0) w jn ( λ jn − ¯ λ jnt (cid:1) , ∀ n, j ) ∈ R NM is the vector of component-wise errors between ¯ f t ( y ) and f ( y ) . Also, we denote g ( y ) ∈ R N +1 the constraint vector of (4b)-(4c), and define ¯ g t ( y ) = g (cid:0) y ) + δ t ( y (cid:1) , (5)with δ t ( y ) = (cid:0) δ nt ( y ) , n = 1 , . . . , N + 1 (cid:1) and δ nt ( y ) = B n − ¯ B nt + M X j =1 y jn (cid:0) ¯ o nt ¯ λ jnt − o n λ jn (cid:1) , n = 1 , . . . , N, δ N +1 ,t ( y ) = H − ¯ H t + M X j =1 N X n =1 y jn (cid:0) ¯ h nt ¯ λ jnt − h n λ jn (cid:1) . ¯ B nt = P tτ =1 B nτ /t is the running average of process { B nt } ∞ t =1 , and similarly we define ¯ H t , ¯ o nt , and ¯ h nt . Notethat ¯ f t ( y ) , ¯ g t ( y ) can be calculated at each slot, while f ( y ) and g ( y ) are unknown. We can now define a new problem: P ( t ) : max y ∈ [0 , NM ¯ f t ( y ) s.t. ¯ g t ( y ) (cid:22) We will use the instances { P ( t ) } t to perform a dual ascentmethod and obtain a sequence of decisions { y } t that will beapplied in real time and achieve performance that convergesasymptotically to the (unknown) solution of P .We first dualize P ( t ) and introduce the Lagrangian : L ( y , µ ) , ¯ f t ( y ) + µ ⊤ ¯ g t ( y ) = M X j =1 N X n =1 w jn y jn ¯ λ jnt + N X n =1 µ n (cid:0) M X j =1 y jn ¯ λ jnt ¯ o nt − ¯ B nt (cid:1) + ξ (cid:0) M X j =1 N X n =1 y jn ¯ λ jnt ¯ h nt − ¯ H t (cid:1) where µ = ( µ , µ , . . . , µ N , ξ ) are the non-negative dualvariables for ¯ g t ( y ) (cid:22) . The dual function is: V ( µ ) = arg min (cid:22) y (cid:22) L ( y , µ ) , (6)and the dual problem amounts to maximizing V ( µ ) .We apply a dual ascent algorithm where the iterations arein sync with the system’s time slots t . Observe that V ( µ ) does not depend on ¯ B nt or ¯ H t , it is separable with respect tothe primal variables, and independent of ¯ λ jnt . Hence, in eachiteration t we can minimize L by: ( y jn ) ∗ ∈ arg min y jn ∈ [0 , y jn ( − w jn + µ nt ¯ o nt + ξ t ¯ h nt ) , ∀ n, j. (7)This yields the following easy-to-implement threshold rule: y jnt = ( if λ jnt > and µ nt ¯ o nt + ξ t ¯ h nt < w jn otherwise. (8)which is a deterministic decision that offloads (or not) allrequests of each device (at each t ). Then we improve thecurrent value of V t ( µ ) by updating the dual variables: µ n,t +1 = h µ nt + α (cid:0) M X j =1 ¯ o nt ¯ λ jnt y jnt − ¯ B nt (cid:1)i + , ∀ n, (9) ξ t +1 = h ξ t + α (cid:0) N X n =1 M X j =1 ¯ h nt ¯ λ jnt y jnt − ¯ H t (cid:1)i + , (10)where α > is the update step size, and return to (7).The detailed steps that implement our online policy areas follows (with reference to OnAlgo , Algorithm 1). Eachdevice n receives a group of objects S nt in slot t and uses itsclassifier to predict their classes, and the predictor to estimate For our system implementation, this relaxation means we install queues forthe data transmission (at the devices) and image processing (at the cloudlet). lgorithm 1: OnAlgo Initialization: t = 0 , ξ = 0 , µ = , y = while True do for each device n ∈ N do Receive objects S nt = { s nt } ; ˆ φ nt , σ nt ← Q n (cid:0) J n ( s nt ) (cid:1) , ∀ s nt ∈ S nt Calculate w nt through (3); Observe o nt , h nt , B nt and calculate ¯ o nt , ¯ h nt , ¯ B nt ; for j = 1 , . . . , M do Observe λ jnt and calculate average ¯ λ jnt and w jn ; Decide y jn by using (8); end for Update µ n,t +1 using (9); Send averages ¯ λ jnt , ∀ j , to cloudlet; end for Cloudlet:
Compute tasks and receive ¯ λ jnt , ∀ n ; Observe H t and calculate ¯ H t ; Update ξ t +1 using (10), and send it to devices; t ← t + 1 ; end while the expected offloading gains (Steps 4-6). They update theirstatistics (step 7) and compare the expected benefits with theoutsourcing costs (Step 10). Finally, they update their localdual variable for the power constraint violation (Step 12). Thecloudlet classifies the received objects (Step 16) and updatesits parameter estimates (Step 17) and its congestion (Step 18),which is sent to the devices. B. Performance Analysis
The gist of our approach is that, as time evolves, thesequence of problems { P ( t ) } t approaches our initial problem P . This is true under the following mild assumption. Assumption 1.
The perturbations of the system parametersare independent to each other, uniformly bounded, and theiraverages converge, e.g., lim t →∞ ¯ B nt = B n . Under this assumption it is easy to see that it holds: lim t →∞ δ t ( y ) = 0 , lim t →∞ y ⊤ ǫ t = 0 , ∀ y . Furthermore, note that due to boundedness of the parametersand y jn ∈ [0 , , ∀ n, j we have that: k g ( y ) k ≤ σ g , k δ t ( y ) k ≤ σ δ t , ∀ t, (11)and using Minkowski’s inequality, we get the bound: k ¯ g t ( y ) k = k g ( y ) + δ t ( y ) k ≤ σ g + σ δ t . (12)It is also easy to see that lim t →∞ σ δ t = 0 . The followingTheorem is our main analytical result. Theorem 1.
Under Assumption 1, OnAlgo ensures the fol-lowing optimality and feasibility gaps: ( i ) lim t →∞ f ( ¯ y t ) ≤ f ∗ + aσ g , ( ii ) lim t →∞ g ( ¯ y t ) (cid:22) , where ¯ y t = t P ti =1 y i .Proof. We drop bold typeface notation here, and use subscript i = 1 , . . . , t to denote the i -th slot. We first bound the distanceof µ t +1 from vector θ ∈ R N +1 , i.e., k µ t +1 − θ k = (cid:13)(cid:13) [ µ t + a (cid:0) g ( y t ) + δ t ( y t ) (cid:1) ] + − θ (cid:13)(cid:13) ≤k µ t − θ k + a k g ( y t ) k + a k δ t ( y t ) k +2 a δ t ( y t ) ⊤ g ( y t )+2 a ( µ t − θ ) ⊤ (cid:0) g ( y t )+ δ t ( y t ) (cid:1) . (13)(i) Optimality Gap. From the dual problem we can write: V ( µ ∗ ) ≥ t t X i =1 V ( µ i ) ≥ t t X i =1 L ( y i , µ i )= 1 t t X i =1 (cid:16) f ( y i )+ y ⊤ i ǫ i + µ ⊤ i (cid:0) g ( y i )+ δ i ( y i ) (cid:1)(cid:17) ≥ f (¯ y t ) + 1 t t X i =1 y ⊤ i ǫ i + 1 t t X i =1 (cid:16) µ ⊤ i (cid:0) g ( y i ) + δ i ( y i ) (cid:1)(cid:17) , (14)where the last inequality follows from Jensen’s inequality.Now, let θ = 0 in (13). Using (11) and the Cauchy-Swartzinequality, and by summing over all t we obtain : k µ t +1 k ≤ k µ k + a tσ g + a t X i =1 σ δ t +2 a σ g t X i =1 σ δ t + 2 a t X i =1 µ ⊤ i (cid:0) g ( y i ) + δ i ( y i ) (cid:1) . Dropping the non-negative term k µ t +1 k , dividing by at ,setting µ = 0 , and rearranging terms, yields: − t t X i =1 µ ⊤ i (cid:0) g ( y i )+ δ i ( y i ) (cid:1) ≤ aσ g a t t X i =1 σ δ i + aσ g t t X i =1 σ δ i . Using the fact that V ( µ ∗ ) = f ∗ , and combining the abovewith (14), we obtain: f (¯ y t ) − f ∗ ≤ − t t X i =1 y ⊤ i ǫ i + aσ g a t t X i =1 σ δ i + aσ g t t X i =1 σ δ i . All sums have diminishing terms and divided by t , henceconverge to . Thus, we obtained the first part of the theorem.(ii) Constraint Violation. If we apply recursively the dualvariable update rule, we obtain: µ t +1 = h µ t + a (cid:0) g ( y t )+ δ t ( y t ) (cid:1)i + (cid:23) µ + a t X i =1 (cid:0) g ( y i )+ δ i ( y i ) (cid:1) . Setting µ = 0 , dividing by at , and using Jensen’s inequalityfor g ( · ) , we get: g (¯ y t ) + 1 t t X i =1 δ i ( y i ) (cid:22) µ t +1 at . (15)The second term of the LHS converges to zero as t → ∞ .Our claim holds if the same is true for the RHS. Indeed, this4s the case assuming the existence of a Slater vector, and theboundedness of the set of dual variables (see [6], [7]).The theorem shows that OnAlgo asymptotically achieves zerofeasibility gap (no constraint violation), and a fixed optimalitygap that can be made arbitrarily small by tuning the step size.IV. I MPLEMENTATION AND E VALUATION
A. Experimentation Setup and Initial Measurements1) Testbed and Measurements:
We used 4 Raspberry Pis(RPs) as end-nodes, placed in different distances from alaptop (cloudlet). We used a Monsoon monitor for the energymeasurements, and Python libraries and TensorFlow for theclassifiers. We first measured the average power consumptionwhen RPs transmit data to the cloudlet with different rates,and then fitted a linear regression model that estimates theconsumed power as a function of r . This model is usedby OnAlgo to estimate the energy cost for each transmittedimage, given the data rate in each slot (which might differfor the RPs). Also, we measured the average computing costs( h n , h cycles/task) of the classification tasks, to be used insimulations. For more details on the setup, see [7].
2) Data Sets and Classifiers:
We use two well-knowndatasets: (i)
MNIST [11] which consists of × pixelhandwritten digits, and includes K training and K testexamples; (ii)
CIFAR-10 [12] with K training and K testexamples of × color images of classes. We usedtwo classifiers, the normalized-distance weighted k -nearestneighbors (KNN) [13], and the more sophisticated Convolu-tional Neural Network (CNN) implemented with TensorFlow[14]. They output a vector with the probabilities that theobject belongs to each class. These classifiers have differentperformance and resource needs, hence allow us to builddiverse experiments. The predictors are trained with labeledimages and the outputs of the local ( f n ) and cloudlet ( f ) clas-sifiers. These are the independent variables in our regressionmodel that estimates φ nt (dependent variables). Recall that thelatter are calculated using (1), where we additionally use that w nt = d ( s nt ) if device n has given a wrong classificationand w nt = − d ( s nt ) if the cloudlet is mistaken.
3) Benchmarks:
We compare OnAlgo with two algorithms.The
Accuracy-Threshold Offloading (ATO) algorithm, where atask is offloaded when the confidence of the local classifier isbelow a threshold, without considering the resource consump-tion. And the
Resource-Consumption Offloading (RCO) algo-rithm, where a task is offloaded when there is enough energy,without considering the expected classification improvement.
4) Limitations of Mobile Devices:
We used our testbedto verify that these small resource-footprint devices requirethe assistance of a cloudlet. Our findings are in line withprevious studies, e.g., [15]. The performance of a CNN modelincreases with the number of layers. We find that, even with layers, a CNN trained for CIFAR has GB size and hencecannot be stored in the RPs (see Fig. 2a). Similar conclusions We used vanilla versions of the classifiers to facilitate observation of theresults. The memory footprint of NNs can be made smaller [10] but this mightaffect their performance. Our analysis is orthogonal to such interventions. hold for the KNN classifier that needs to locally store alltraining samples. Clearly, despite the successful efforts toreduce the size of ML models by, e.g. using compression [10];the increasingly complex analytics and the small form-factorof devices will continue to raise the local versus cloudletexecution trade off.
5) Classifier Assessment:
In Fig. 2b we see that the accu-racy (ratio of successful over total predictions) of the KNNclassifier improves with the size K n of labeled data. Figure 2cpresents the accuracy gains for CNN as more hidden layersare added. The gains are higher (up to 20%) for the digitsthat are difficult to recognize, e.g., and . Fig. 2d showsthe CNN performance on CIFAR, which is lower as this isa more complex dataset (colored images, etc.). Overall, wesee that the classifier performance depends on the algorithm(KNN, CNN), the settings (datasets, layers), and the objects. B. Performance Evaluation1) Resource Availability Impact:
Fig. 3 shows the averageaccuracy and fraction of requests offloaded to the cloudletwith OnAlgo when we vary their power budget. As B n increases there are more opportunities to use the cloudlet (4-layer CNN) and obtain more accurate classifications than thelocal classifier (1-layer CNN). Furthermore, Fig. 2(c-d) showthat MNIST is easier to classify and the gains of using abetter classifier are smaller than with CIFAR. Hence, as B n increases in Fig. 3 the ratio of offloaded tasks increases at afaster pace with CIFAR than with MNIST.
2) Comparison with Benchmarks:
We compare OnAlgo toATO and RCO. No-offloading (NO) serves as a baseline forthese algorithms in Fig. 4. To ensure a realistic comparison,we set the rule for all algorithms that the cloudlet willnot serve any task if the computing capacity constraint isviolated. For RCO, the availability of energy is determinedby computing the running average consumption at each deviceduring the experiment. We employ two testbed scenarios, anda simulation with larger number of devices.
Scenario 1:
Low accuracy improvement; high resources .We set B n = 0 . mW and H = 2 GHz allowing the devicesto offload many tasks, and the cloudlet to serve most of them;and used MNIST (has small improvement). We demonstratethe average accuracy and power consumption in Fig. 4a, wherewe see that OnAlgo outperforms both ATO and RCO by .Regarding power consumption, ATO achieves the best resultsince it gets high enough confidence on its local classifier(rarely offloads). RCO however, offloads almost every taskas it has enough resources and does not refrain even whenimprovement is low. The reason it achieves lower accuracythan onAlgo is that it does not offload intelligently, and getsdenied when the computing constraint is violated. Scenario 2:
High accuracy improvement; low resources .We set B n = 0 . mW and H = 200 M Hz not allowing manyoffloadings and cloudlet classifications. We used the CIFARdataset which has a large performance difference between We have explicitly set a small power budget so as to highlight the impactof power constraints on the system performance; higher power budgets willstill be a bottleneck for higher task request rates or images of larger size. M od e l s i z e ( M B ) (a) Memory usage of CNN A cc u r a c y K n =5×10 K n =200 K n =50 (b) KNN on MNIST Class A cc u r acy (c) CNN on MNIST A i r p l a n e A u t o m ob il e B i r d C a t D ee r D ogF r og H o r se S h i pT r u ck A cc u r acy (d) CNN on CIFAR Fig. 2: CNN memory usage vs number of layers, and accuracy of MNIST and CIFAR-10 for KNN and CNN classifiers.
MNIST B n ( 10 -6 ) P e r ce n t a g e ( % ) AccuracyOffloadings
CIFAR B n ( 10 -6 ) P e r ce n t a g e ( % ) AccuracyOffloadings
Fig. 3: Average accuracy and outsourcing of OnAlgo fordifferent power budgets, on MNIST (left) and CIFAR (right).
OnAlgo ATO RCO NO0.50.60.70.80.91 A cc u r acy A v g . P o w e r ( m W ) (a) Scenario 1 OnAlgo ATO RCO NO00.10.20.30.40.5 A cc u r acy A v g . P o w e r ( m W ) (b) Scenario 2 Fig. 4: Performance comparison of the offloading algorithms.local and cloudlet classifiers. We see from Fig. 4b that On-Algo achieves 28%-32% higher accuracy than both competingalgorithms. RCO is constrained to very few offloadings dueto the limited power budget, while ATO is resource-obliviousand offloads tasks regardless of the cloudlet’s capacity. Thisresults in many denied offloadings that reduce ATO’s accuracyand unnecessarily increase the power consumption. OnAlgoconsumes 60% less power than ATO as it frequently offloadsits low-confidence tasks.
Scenario 3:
Large number of users . Finally, we simulatedthe algorithms for a large number of users while using theexperimentally measured parameters. We observe in Fig. 5athat the accuracy gradually drops (for all algorithms) sincenow a smaller percentage of the tasks can be served by thecloudlet. OnAlgo constantly outperforms both ATO and RCOby about − since it adapts to the available resources.This is more evident in Fig. 5b that shows the fast-increasingenergy cost of the two benchmark algorithms, as they eitheroffload tasks that do not improve the performance, or offloadtasks while the cloudlet is already congested (these tasks aredropped and energy is wasted). Power consumption of OnAlgois up to less than that of RCO. A cc u r acy OnAlgoATORCO (a) Accuracy Comparison A v g . P o w e r ( m W ) OnAlgoATORCO (b) Power Cost Comparison
Fig. 5: Simulation results for increasing number of users onthe CIFAR dataset. B n = 0 . mW, H = 2 GHz . Slot -0.500.51 (a) Optimality gap
Slot (b) Constraint violation
Fig. 6: The convergence properties of OnAlgo when M =6 , N = 5 . Note that constraints are eventually satisfied, withsome of them in a strict fashion (hence the norm is not zero).
3) Convergence of OnAlgo:
Fig. 6 presents the conver-gence of OnAlgo for different step sizes α . Based on thesystem parameters the bound given by Theorem 1 is approx-imately 0.01, 0.2 and 1 for the three α values of Fig. 6.These are satisfied by the solution of OnAlgo in less than300 iterations as observed in Fig. 6a. The convergence isfaster for larger α , which however is achieved at the cost ofsmaller convergence accuracy. The constraint violation boundis also respected as shown in Fig. 6b with the constraintsbeing violated more often for small α in the beginning, butimproving as T increases.V. R ELATED W ORK
Edge & Distributed Computing . Most solutions parti-tion compute-intense mobile applications and offload themto the cloud [16]; a solution that is unfit to enable low-latency applications. Cloudlets on the other hand, achievelower delay [4] but have limited serving capacity, hence6here is a need for an intelligent offloading strategy that wepropose here. Previous works consider simple performancecriteria, such as reducing computation loads [17], or powerconsumption [18] and focus on the architecture design. Also,Mobistreams [19] and Swing [20] focus on collaborativedata stream computations. The above systems either do notoptimize the offloading policy, or use heuristics that do notcater for task accuracy.
Mobile and IoT Analytics . The importance of analytics hasmotivated the design of wireless systems that can execute suchtasks. For instance, [21], [22] tailor deep neural networks forexecution in mobile devices, while [23] and [24] minimize theexecution time for known system parameters and task loads.Finally, [25]–[27] leverage the edge architecture to effectivelyexecute analytics for IoT devices. The plethora of such systemproposals, underlines the necessity for our online decisionframework that provides optimal execution of analytics.
Optimization of Analytics . Prior works in computationoffloading focus on different metrics such as number of servedrequests, [5], [28], and hence are not applicable here. In ourprevious work [29], we proposed a static collaborative opti-mization framework, which does not employ predictions noraccounts for computation constraints. Other works, e.g. [22]either rely on heuristics or assume static systems and knownrequests. Clearly, these assumptions are invalid for manypractical cases where system parameters not only vary withtime, but often do not follow i.i.d. processes. This renders theapplication of max-weight type of policies [9] inefficient. Ourapproach is fundamentally different and leads to an onlinerobust algorithm and is inspired by dual averaging and primalrecovery algorithms for static problems, see [6].
Improvement of ML Models . Clearly, despite the effortsto improve the execution of analytics at small devices, e.g., byresidual learning or compression [10], the trade off betweenlocal low-accuracy and cloudlet high-accuracy execution isstill important due to the increasing number and complexity ofthese tasks. This observation has spurred efforts for designingfast multi-tier (cloud to edge) deep neural networks [15]and for dynamic model selection [30], among others. Theseworks are orthogonal to our approach and can be directlyincorporated in our framework.VI. C
ONCLUSIONS
We propose the idea of improving the execution of dataanalytics at IoT devices with more robust instances runningat cloudlets. The key feature of our proposal is a dynamicand distributed algorithm that makes the outsourcing decisionsbased on the expected performance improvement, and theavailable resources at the devices and cloudlet. The proposedalgorithm achieves near-optimal performance in a determinis-tic fashion, and under minimal assumptions about the systembehavior. This makes it ideal for the problem at hand where,the stochastic effects (e.g., expected accuracy gains) haveunknown mean values and possibly non-i.i.d. behavior.A
CKNOWLEDGMENTS
This publication has emanated from research supported inpart by SFI research grants 17/CDA/4760, 16/IA/4610 and is co-funded under the European Regional Development Fundunder Grant Number 13/RC/2077.R
EFERENCES[1] E. Siow, T. Tiropanis, and W. Hall, “Analytics for the internet of things:A survey,”
ACM Comput. Surv. , vol. 51, no. 4, pp. 74:1–74:36, 2018.[2] C. Jiang et al. , “Machine learning paradigms for next-generation wire-less networks,”
IEEE Wireless Comm. , vol. 24, no. 2, pp. 98–105, 2017.[3] Cisco White Paper, “Cisco global cloud index: Forecast and methodol-ogy, document id:1513879861264127,” 2018.[4] M. Satyanarayanan et al. , “The case for vm-based cloudlets in mobilecomputing,”
IEEE Pervasive Computing , vol. 8, no. 4, pp. 14–23, 2009.[5] Y. Mao et al. , “A survey on mobile edge computing: The communicationperspective,”
IEEE Comm. Surv. Tut. , vol. 19, no. 4, pp. 2322–2358,2017.[6] A. Nedi´c and A. Ozdaglar, “Approximate primal solutions and rateanalysis for dual subgradient methods,”
SIAM J. on Optimization ,vol. 19, no. 4, pp. 1757–1780, 2009.[7] A. Galanopoulos et al. , “Improving iot analyticsthrough selective edge execution: Appendix,” 2019,https://1drv.ms/b/s!AoI5lEO8XUP1iQIjf1w0YeaUCa83?e=9IW474.[8] A. Gelman and J. Hill, ”Data Analysis Using Regression and Multi-level/Hierarchical Models” . Cambridge University Press, 2007.[9] L. Georgiadis, M. J. Neely, and L. Tassiulas, “Resource allocation andcross-layer control in wireless networks,”
Found. Trends Netw. , vol. 1,no. 1, pp. 1–144, 2006.[10] V. Chandrasekhar et al. , “Compression of deep neural networks forimage instance retrieval,” in
Proc. of DCC , 2017.[11] Y. Lecun et al. , “Gradient-based learning applied to document recogni-tion,”
Proc. of the IEEE , vol. 86, no. 11, pp. 2278–2324, 1998.[12] A. Krizhevsky, “Learning multiple layers of features from tiny images,”Tech. Rep., 2009.[13] S. A. Dudani, “The distance-weighted k-nearest-neighbor rule,”
IEEETrans. on Sys., Man, and Cybern. , vol. 6, no. 4, pp. 325–327, 1976.[14] M. Abadi et al. , “Tensorflow: A system for large-scale machine learn-ing,” in
Proc. of USENIX OSDI , 2016.[15] S. Teerapittayanon et al. , “Distributed deep neural networks over thecloud, the edge and end devices,” in
Proc. of IEEE ICDCS , 2017.[16] B.-G. Chun et al. , “Clonecloud: Elastic execution between mobiledevice and cloud,” in
Proc. of EuroSys , 2011.[17] A. Dou et al. , “Misco: A mapreduce framework for mobile systems,”in
Proc. of PETRA , 2010.[18] X. Lyu et al. , “Selective offloading in mobile edge computing for thegreen internet of things,”
IEEE Network , vol. 32, no. 1, pp. 54–60, 2018.[19] H. Wang and L. Peh, “Mobistreams: A reliable distributed streamprocessing system for mobile devices,” in
Proc. of IEEE IPDPS , 2014.[20] S. Fan, T. Salonidis, and B. Lee, “Swing: Swarm computing for mobilesensing,” in
Proc. of IEEE ICDCS , 2018.[21] X. Ran et al. , “Delivering deep learning to mobile devices via offload-ing,” in
Proc. of VR/AR Network Workshop , 2017.[22] X. Ran et al. , “Deepdecision: A mobile deep learning framework foredge video analytics,” in
Proc. of IEEE INFOCOM , 2018.[23] Y. Li et al. , “Mobiqor: Pushing the envelope of mobile edge computingvia quality-of-result optimization,” in
Proc. of IEEE ICDCS , 2017.[24] W. Zhang et al. , “Hetero-edge: Orchestration of real-time vision appli-cations on heterogeneous edge clouds,” in
Proc. of IEEE INFOCOM ,2019.[25] G. Li et al. , “Data analytics for fog computing by distributed onlinelearning with asynchronous update,” in
Proc. of IEEE ICC , 2019.[26] S. K. Sharma and X. Wang, “Live data analytics with collaborative edgeand cloud processing in wireless iot networks,”
IEEE Access , vol. 5, pp.4621–4635, 2017.[27] J. He et al. , “Multitier fog computing with large-scale iot data analyticsfor smart cities,”
IEEE Internet of Things Journal , vol. 5, no. 2, pp.677–686, 2018.[28] X. Chen et al. , “Efficient multi-user computation offloading for mobile-edge cloud computing,”
IEEE/ACM Trans. on Networking , vol. 24, no. 5,pp. 2795–2808, 2016.[29] A. Galanopoulos, G. Iosifidis, and T. Salonidis, “Optimizing dataanalytics in energy constrained iot networks,” in
Proc. of WiOpt , 2018.[30] L. Liu and J. Deng, “Dynamic deep neural networks:Optimizing accuracy-efficiency trade-offs by selective execution,” arXiv:1701.00299 , 2017., 2017.