[PDF] Improving IoT Analytics through Selective Edge Execution

Abstract

A large number of emerging IoT applications rely on machine learning routines for analyzing data. Executing such tasks at the user devices improves response time and economizes network resources. However, due to power and computing limitations, the devices often cannot support such resource-intensive routines and fail to accurately execute the analytics. In this work, we propose to improve the performance of analytics by leveraging edge infrastructure. We devise an algorithm that enables the IoT devices to execute their routines locally; and then outsource them to cloudlet servers, only if they predict they will gain a significant performance improvement. It uses an approximate dual subgradient method, making minimal assumptions about the statistical properties of the system's parameters. Our analysis demonstrates that our proposed algorithm can intelligently leverage the cloudlet, adapting to the service requirements.

Full PDF

aa r X i v : . [ c s . N I] M a r Improving IoT Analytics through Selective EdgeExecution

Apostolos Galanopoulos ∗ , Argyrios G. Tasiopoulos † , George Iosiﬁdis ∗ ,Theodoros Salonidis ‡ , Douglas J. Leith ∗∗ School of Computer Science and Statistics, Trinity College Dublin † Department of Electronic and Electrical Engineering, University College London ‡ IBM T. J. Watson Research Center, New York

Abstract —A large number of emerging IoT applications relyon machine learning routines for analyzing data. Executing suchtasks at the user devices improves response time and economizesnetwork resources. However, due to power and computing limi-tations, the devices often cannot support such resource-intensiveroutines and fail to accurately execute the analytics. In this work,we propose to improve the performance of analytics by leveragingedge infrastructure. We devise an algorithm that enables theIoT devices to execute their routines locally; and then outsourcethem to cloudlet servers, only if they predict they will gain asigniﬁcant performance improvement. It uses an approximatedual subgradient method, making minimal assumptions aboutthe statistical properties of the system’s parameters. Our analysisdemonstrates that our proposed algorithm can intelligentlyleverage the cloudlet, adapting to the service requirements.

Index Terms —Edge Computing, Network Optimization, Re-source Allocation, Data Analytics

I. I

NTRODUCTION

The recent demand for machine learning (ML) applications,such as image recognition, natural language translation, andhealth monitoring, has been unprecedented [1]. These servicescollect data streams generated by small devices, and analyzethem locally or at distant cloud servers. There is growingconsensus that such applications will be ubiquitous in Internetof Things (IoT) systems [2]. The challenge, however, withsuch services is that they are often resource intensive. On theone hand, the cloud offers powerful ML models and abundantcompute resources but requires data transfers which consumenetwork bandwidth and might induce signiﬁcant delays [3].On the other hand, executing these services at the deviceseconomizes bandwidth but degrades their performance due tothe devices’ limited resources, e.g. memory or energy.A promising approach to tackle this problem is to allow thedevices to outsource individual ML tasks to edge infrastruc-ture such as cloudlets [4]. This can increase their executionaccuracy since the cloudlet’s ML components are typicallymore complex, and hence offer improved results. Nevertheless,the success of such solutions presumes intelligent outsourcingalgorithms. The cloudlets, unlike the cloud, have limitedcomputing capacity and cannot support all requests. At thesame time, task execution requires the transfer of large datavolumes (e.g., video streams). This calls for prudent transmis-sion decisions in order to avoid wasting device energy andbandwidth. Furthermore, unlike prior computation ofﬂoading solutions [5], it is crucial to only outsource the tasks that cansigniﬁcantly beneﬁt from cloudlet execution.

Our goal is to design an online framework that addressesthe above issues and makes intelligent outsourcing decisions .We consider a system where a cloudlet improves the executionof image classiﬁcation tasks running on devices such aswireless IoT cameras. We assume that each device has a ”low-precision” classiﬁer while the cloudlet can execute the taskwith higher precision. The devices classify the received objectsupon arrival, and decide whether to transmit them to thecloudlet or not, to get a better classiﬁcation result. Making thisdecision requires an assessment of the potential performancegains, which are measured in terms of accuracy improvements.To this end, we propose the usage of a predictor at each devicethat leverages the local classiﬁcation results.We consider the practical case where the resources’ avail-ability is unknown and time-varying, but their instantaneousvalues are observable. We design a distributed adaptive algo-rithm that decides the task outsourcing policy towards max-imizing the long-term performance of analytics. To achievethis, we formulate the system’s operation as an optimizationproblem, which is decomposed via Lagrange relaxation to aset of device-speciﬁc problems. This enables its distributedsolution through an approximate – due to the unknown pa-rameters – dual ascent method, that can be applied in realtime. The method is inspired by primal averaging schemesfor static problems, e.g., see [6], and achieves a bounded andtunable optimality gap using a novel approximate iterationtechnique. Our contributions can be summarized as follows: • Edge Analytics . We study the novel problem of intelligentlyimproving data analytics tasks using edge infrastructure,which is increasingly important for the IoT. • Decision Framework . We propose an online task outsourc-ing algorithm that achieves near-optimal performance undervery general conditions (unknown, non i.i.d. statistics). Thisis a novel analytical result of independent value. • Implementation & Evaluation . The solution is evaluated ina wireless testbed using a ML application, several classiﬁersand datasets. We ﬁnd that our algorithm increases theaccuracy (up to ) and reduces the energy (down to )compared to carefully selected benchmark policies.

Organization . Sec. II introduces the model and the prob-lem. Sec. III presents the algorithm and Sec. IV the system1mplementation, experiments and trace-driven simulations. Wediscuss related work in Sec. V and conclude in Sec. VI.Although the paper is completely self-sufﬁcient, the interestedreader will ﬁnd more results from the implementation of oursystem, as well as a more detailed version of the proof of ourmain analytical contribution in [7].II. M

ODEL AND P ROBLEM F ORMULATION

Classiﬁers . There is a set C of C disjoint object classesand a set N of N edge devices. We assume a time-slottedoperation where each device n receives at slot t a group ofobjects (or tasks) S nt to be classiﬁed, e.g., frames captured byits camera. We deﬁne S n ⊇ S nt , ∀ t as the set of objects thatcan arrive at n , and S = ∪{S n } n . Each device n is equippedwith a local classiﬁer J n : S n → (cid:0) C n , d n ( s nt ) (cid:1) , which outputsthe inferred class of an object s nt and a normalized conﬁdencevalue d n ( s nt ) ∈ [0 , for that inference . The cloudlet has aclassiﬁer J : S → (cid:0) C , d ( s nt ) (cid:1) that can classify any object,and offers higher accuracy from all devices, i.e., d ( s nt ) ≥ d n ( s nt ) , ∀ n ∈ N .Let φ nt ∈ [0 , denote the accuracy improvement when thecloudlet classiﬁer is used: φ nt ( s nt ) = d ( s nt ) − d n ( s nt ) , ∀ n ∈ N , s nt ∈ S nt . (1)Every device is also equipped with a predictor Q n that istrained with the outcomes of the local and cloudlet classiﬁers.This predictor can estimate the accuracy improvement offeredby the cloudlet for each object s nt ∈ S nt : Q n : (cid:0) J n ( s nt ) (cid:1) → (cid:0) ˆ φ nt , σ nt (cid:1) , (2)and, in general, this assessment might be inexact, ˆ φ nt ( s nt ) = φ nt ( s nt ) , and σ nt ∈ [0 , is the respective conﬁdence value. Wireless System . The devices access the cloudlet throughhigh capacity cellular or Wi-Fi links. Each device n has an average power budget of B n Watts. Power is a key limitationhere because the devices might have a small energy budget dueto protocol-induced transmission constraints, or due to useraversion for energy spending. The cloudlet has an averageprocessing capacity of H cycles/sec which is shared by thedevices, and when the total load exceeds H , the task delayincreases and eventually renders the system non-responsive. We consider the realistic scenario where the parameters ofdevices and the cloudlet change over time in an unknown fash-ion . Namely, they are created by random processes { B nt } ∞ t =1 and { H t } ∞ t =1 , and our decision framework has access onlyto their instantaneous values in each slot. Unlike previousoptimization frameworks [9] that assume i.i.d., or Markovmodulated processes; here we only ask that these perturbationsare bounded in each slot, i.e. H t ≤ H max , B nt ≤ B max , ∀ t and their averages converge to some ﬁnite values which wedo need to know, i.e., lim t →∞ P tτ =1 B nt /t = B n , ∀ n , andsimilarly for { H t } ∞ t =1 . We also deﬁne B t = ( B nt , n ∈ N ) . The classiﬁer might output only the class with the highest conﬁdence, ora vector with the conﬁdence for each class; our analysis holds for both cases. This can be a model-based or model-free solution, e.g., a regressor or aneural-network; our analysis and framework work for any of these solutions.In the implementation we used a mixed-effects regressor, see [8].

Fig. 1: Schematic of the basic notation and procedure followedby the system’s devices.When an object (say, image) is transmitted in slot t from de-vice n to the cloudlet, it consumes part of the device’s powerbudget B n . We assume that this cost, denoted o nt , followsa random process { o nt } ∞ t =1 that is uniformly upper-boundedand has well-deﬁned mean values. Also, each transmittedobject requires a number of processing cycles in the cloudletwhich might also vary with time, e.g., due to the differenttype of the objects, and we assume it follows the randomprocess { h nt } ∞ t =1 , with lim t →∞ P tτ =1 h nt /t = h n . We deﬁne o t = ( o nt ≤ o max , n ∈ N ) , and h t = ( h nt ≤ h max , n ∈ N ) .Our model is very general as the (i) requests, (ii) power andcomputing cost per request, and (iii) resource availability, canbe arbitrarily time-varying, and with unknown statistics. Problem Formulation . The IoT devices wish to involve thecloudlet only when they conﬁdently expect high classiﬁcationprecision gains. Otherwise, they will consume the cloudlet’scapacity and their own power without signiﬁcant performancebeneﬁts. Therefore, we make the outsourcing decision for eachobject s nt based on the weighted improvement gain : w nt ( s nt ) = ˆ φ nt − ρ n σ nt , ∀ n, t , (3)where ρ n ≥ is a risk aversion parameter set by thesystem designer or each user. For example, assuming normaldistribution for φ nt , we could set ρ n = 1 and use a thresholdrule of standard deviation. We use hereafter these mod-iﬁed parameters w nt , ∀ n , and partition the interval of theirvalues [ − w , w ] ( w being the maximum) into subintervals I j , j = 1 , . . . , M such that ∪ Mj =1 I j = [ − w , w ] , ∀ i = j ; with w jn being the center point of I j . This quantization facilitatesthe implementation of our algorithm in a real system, andis without loss of generality since we can use very shortintervals. Finally, let λ jnt denote the number of objects withexpected gain w jn that device n has created in slot t . These arrivals are generated by an unknown process { λ jnt } ∞ t =1 , with lim T →∞ /T P Tt =1 λ jnt = λ jn , ∀ n, j . Our aim is to maximize the aggregate long-term analyticsperformance gains, for all objects and IoT devices.

This canbe formulated as a mathematical program. We deﬁne variables y jn ∈ [0 , , ∀ n, j which indicate the long term ratio of objectswith expected gain of w jn that are sent to the cloudlet (with y jn = 1 , when all objects of n in I j are sent), and formulatethe convex problem: Power budgets are also affected by the local classiﬁer computations whichare made for every object and thus do not affect the ofﬂoading decisions. This cost can reﬂect, e.g., the impact of time-varying channel conditions. : maximize y jn ∈ [0 , M X j =1 N X n =1 w jn λ jn y jn , f ( y ) (4a) s.t. M X j =1 y jn λ jn o n ≤ B n , n ∈ N , (4b) M X j =1 N X n =1 y jn λ jn h n ≤ H, (4c)where y = ( y jn : ∀ n, j ) . Eq. (4b) constraints the average powerbudget of each device and (4c) bounds the cloudlet utilization.Clearly, based on the speciﬁcs of each system we can addmore constraints, e.g., for the average wireless link capacity incase bandwidth is also a bottleneck resource. Such extensionsare straightforward as they do not change the properties of theproblem, nor affect our analysis below.The solution of P is a policy y ∗ that maximizes theaggregate (hence also average) analytics performance in thesystem. Such policies can be randomized, with y j ∗ n denotingthe probability of sending each object of n in interval I j tothe cloudlet (at each slot). However, in reality, the systemparameters not only change with time, but are generated byprocesses that might not be i.i.d. and have unknown statistics(mean values, etc.). This means that in practice we cannotﬁnd y ∗ . In the next section we present an online policy thatis oblivious to the statistics of { λ t } , { o t } , { h t } , { H t } , { B t } but achieves indeed the same performance with y ∗ .III. O NLINE O FFLOADING A LGORITHM

Our solution approach is simple and, we believe, elegant.We replace the unknown parameters H , λ jn , B n , o n and h n , ∀ n, j in P with their running averages (which we calculate asthe system operates), solve the modiﬁed problem with gradientascent in the dual space, and perform primal averaging. Thisgives us an online policy that applies in real time the solution y t , ∀ t, while using only information made available by slot t . A. Problem Decomposition & Algorithm Design

Let us ﬁrst deﬁne the running-average function: ¯ f t ( y ) , M X j =1 N X n =1 w jn y jn ¯ λ jnt = M X j =1 N X n =1 w jn λ jn y jn + M X j =1 N X n =1 w jn y jn ( λ jn − ¯ λ jnt )= f ( y ) + y ⊤ ǫ t , where ¯ λ jnt = P tτ =1 λ jnτ /t is the running average of λ jn , and ǫ t = (cid:0) w jn ( λ jn − ¯ λ jnt (cid:1) , ∀ n, j ) ∈ R NM is the vector of component-wise errors between ¯ f t ( y ) and f ( y ) . Also, we denote g ( y ) ∈ R N +1 the constraint vector of (4b)-(4c), and deﬁne ¯ g t ( y ) = g (cid:0) y ) + δ t ( y (cid:1) , (5)with δ t ( y ) = (cid:0) δ nt ( y ) , n = 1 , . . . , N + 1 (cid:1) and δ nt ( y ) = B n − ¯ B nt + M X j =1 y jn (cid:0) ¯ o nt ¯ λ jnt − o n λ jn (cid:1) , n = 1 , . . . , N, δ N +1 ,t ( y ) = H − ¯ H t + M X j =1 N X n =1 y jn (cid:0) ¯ h nt ¯ λ jnt − h n λ jn (cid:1) . ¯ B nt = P tτ =1 B nτ /t is the running average of process { B nt } ∞ t =1 , and similarly we deﬁne ¯ H t , ¯ o nt , and ¯ h nt . Notethat ¯ f t ( y ) , ¯ g t ( y ) can be calculated at each slot, while f ( y ) and g ( y ) are unknown. We can now deﬁne a new problem: P ( t ) : max y ∈ [0 , NM ¯ f t ( y ) s.t. ¯ g t ( y ) (cid:22) We will use the instances { P ( t ) } t to perform a dual ascentmethod and obtain a sequence of decisions { y } t that will beapplied in real time and achieve performance that convergesasymptotically to the (unknown) solution of P .We ﬁrst dualize P ( t ) and introduce the Lagrangian : L ( y , µ ) , ¯ f t ( y ) + µ ⊤ ¯ g t ( y ) = M X j =1 N X n =1 w jn y jn ¯ λ jnt + N X n =1 µ n (cid:0) M X j =1 y jn ¯ λ jnt ¯ o nt − ¯ B nt (cid:1) + ξ (cid:0) M X j =1 N X n =1 y jn ¯ λ jnt ¯ h nt − ¯ H t (cid:1) where µ = ( µ , µ , . . . , µ N , ξ ) are the non-negative dualvariables for ¯ g t ( y ) (cid:22) . The dual function is: V ( µ ) = arg min (cid:22) y (cid:22) L ( y , µ ) , (6)and the dual problem amounts to maximizing V ( µ ) .We apply a dual ascent algorithm where the iterations arein sync with the system’s time slots t . Observe that V ( µ ) does not depend on ¯ B nt or ¯ H t , it is separable with respect tothe primal variables, and independent of ¯ λ jnt . Hence, in eachiteration t we can minimize L by: ( y jn ) ∗ ∈ arg min y jn ∈ [0 , y jn ( − w jn + µ nt ¯ o nt + ξ t ¯ h nt ) , ∀ n, j. (7)This yields the following easy-to-implement threshold rule: y jnt = ( if λ jnt > and µ nt ¯ o nt + ξ t ¯ h nt < w jn otherwise. (8)which is a deterministic decision that ofﬂoads (or not) allrequests of each device (at each t ). Then we improve thecurrent value of V t ( µ ) by updating the dual variables: µ n,t +1 = h µ nt + α (cid:0) M X j =1 ¯ o nt ¯ λ jnt y jnt − ¯ B nt (cid:1)i + , ∀ n, (9) ξ t +1 = h ξ t + α (cid:0) N X n =1 M X j =1 ¯ h nt ¯ λ jnt y jnt − ¯ H t (cid:1)i + , (10)where α > is the update step size, and return to (7).The detailed steps that implement our online policy areas follows (with reference to OnAlgo , Algorithm 1). Eachdevice n receives a group of objects S nt in slot t and uses itsclassiﬁer to predict their classes, and the predictor to estimate For our system implementation, this relaxation means we install queues forthe data transmission (at the devices) and image processing (at the cloudlet). lgorithm 1: OnAlgo Initialization: t = 0 , ξ = 0 , µ = , y = while True do for each device n ∈ N do Receive objects S nt = { s nt } ; ˆ φ nt , σ nt ← Q n (cid:0) J n ( s nt ) (cid:1) , ∀ s nt ∈ S nt Calculate w nt through (3); Observe o nt , h nt , B nt and calculate ¯ o nt , ¯ h nt , ¯ B nt ; for j = 1 , . . . , M do Observe λ jnt and calculate average ¯ λ jnt and w jn ; Decide y jn by using (8); end for Update µ n,t +1 using (9); Send averages ¯ λ jnt , ∀ j , to cloudlet; end for Cloudlet:

Compute tasks and receive ¯ λ jnt , ∀ n ; Observe H t and calculate ¯ H t ; Update ξ t +1 using (10), and send it to devices; t ← t + 1 ; end while the expected ofﬂoading gains (Steps 4-6). They update theirstatistics (step 7) and compare the expected beneﬁts with theoutsourcing costs (Step 10). Finally, they update their localdual variable for the power constraint violation (Step 12). Thecloudlet classiﬁes the received objects (Step 16) and updatesits parameter estimates (Step 17) and its congestion (Step 18),which is sent to the devices. B. Performance Analysis

The gist of our approach is that, as time evolves, thesequence of problems { P ( t ) } t approaches our initial problem P . This is true under the following mild assumption. Assumption 1.

The perturbations of the system parametersare independent to each other, uniformly bounded, and theiraverages converge, e.g., lim t →∞ ¯ B nt = B n . Under this assumption it is easy to see that it holds: lim t →∞ δ t ( y ) = 0 , lim t →∞ y ⊤ ǫ t = 0 , ∀ y . Furthermore, note that due to boundedness of the parametersand y jn ∈ [0 , , ∀ n, j we have that: k g ( y ) k ≤ σ g , k δ t ( y ) k ≤ σ δ t , ∀ t, (11)and using Minkowski’s inequality, we get the bound: k ¯ g t ( y ) k = k g ( y ) + δ t ( y ) k ≤ σ g + σ δ t . (12)It is also easy to see that lim t →∞ σ δ t = 0 . The followingTheorem is our main analytical result. Theorem 1.

Under Assumption 1, OnAlgo ensures the fol-lowing optimality and feasibility gaps: ( i ) lim t →∞ f ( ¯ y t ) ≤ f ∗ + aσ g , ( ii ) lim t →∞ g ( ¯ y t ) (cid:22) , where ¯ y t = t P ti =1 y i .Proof. We drop bold typeface notation here, and use subscript i = 1 , . . . , t to denote the i -th slot. We ﬁrst bound the distanceof µ t +1 from vector θ ∈ R N +1 , i.e., k µ t +1 − θ k = (cid:13)(cid:13) [ µ t + a (cid:0) g ( y t ) + δ t ( y t ) (cid:1) ] + − θ (cid:13)(cid:13) ≤k µ t − θ k + a k g ( y t ) k + a k δ t ( y t ) k +2 a δ t ( y t ) ⊤ g ( y t )+2 a ( µ t − θ ) ⊤ (cid:0) g ( y t )+ δ t ( y t ) (cid:1) . (13)(i) Optimality Gap. From the dual problem we can write: V ( µ ∗ ) ≥ t t X i =1 V ( µ i ) ≥ t t X i =1 L ( y i , µ i )= 1 t t X i =1 (cid:16) f ( y i )+ y ⊤ i ǫ i + µ ⊤ i (cid:0) g ( y i )+ δ i ( y i ) (cid:1)(cid:17) ≥ f (¯ y t ) + 1 t t X i =1 y ⊤ i ǫ i + 1 t t X i =1 (cid:16) µ ⊤ i (cid:0) g ( y i ) + δ i ( y i ) (cid:1)(cid:17) , (14)where the last inequality follows from Jensen’s inequality.Now, let θ = 0 in (13). Using (11) and the Cauchy-Swartzinequality, and by summing over all t we obtain : k µ t +1 k ≤ k µ k + a tσ g + a t X i =1 σ δ t +2 a σ g t X i =1 σ δ t + 2 a t X i =1 µ ⊤ i (cid:0) g ( y i ) + δ i ( y i ) (cid:1) . Dropping the non-negative term k µ t +1 k , dividing by at ,setting µ = 0 , and rearranging terms, yields: − t t X i =1 µ ⊤ i (cid:0) g ( y i )+ δ i ( y i ) (cid:1) ≤ aσ g a t t X i =1 σ δ i + aσ g t t X i =1 σ δ i . Using the fact that V ( µ ∗ ) = f ∗ , and combining the abovewith (14), we obtain: f (¯ y t ) − f ∗ ≤ − t t X i =1 y ⊤ i ǫ i + aσ g a t t X i =1 σ δ i + aσ g t t X i =1 σ δ i . All sums have diminishing terms and divided by t , henceconverge to . Thus, we obtained the ﬁrst part of the theorem.(ii) Constraint Violation. If we apply recursively the dualvariable update rule, we obtain: µ t +1 = h µ t + a (cid:0) g ( y t )+ δ t ( y t ) (cid:1)i + (cid:23) µ + a t X i =1 (cid:0) g ( y i )+ δ i ( y i ) (cid:1) . Setting µ = 0 , dividing by at , and using Jensen’s inequalityfor g ( · ) , we get: g (¯ y t ) + 1 t t X i =1 δ i ( y i ) (cid:22) µ t +1 at . (15)The second term of the LHS converges to zero as t → ∞ .Our claim holds if the same is true for the RHS. Indeed, this4s the case assuming the existence of a Slater vector, and theboundedness of the set of dual variables (see [6], [7]).The theorem shows that OnAlgo asymptotically achieves zerofeasibility gap (no constraint violation), and a ﬁxed optimalitygap that can be made arbitrarily small by tuning the step size.IV. I MPLEMENTATION AND E VALUATION

A. Experimentation Setup and Initial Measurements1) Testbed and Measurements:

We used 4 Raspberry Pis(RPs) as end-nodes, placed in different distances from alaptop (cloudlet). We used a Monsoon monitor for the energymeasurements, and Python libraries and TensorFlow for theclassiﬁers. We ﬁrst measured the average power consumptionwhen RPs transmit data to the cloudlet with different rates,and then ﬁtted a linear regression model that estimates theconsumed power as a function of r . This model is usedby OnAlgo to estimate the energy cost for each transmittedimage, given the data rate in each slot (which might differfor the RPs). Also, we measured the average computing costs( h n , h cycles/task) of the classiﬁcation tasks, to be used insimulations. For more details on the setup, see [7].

2) Data Sets and Classiﬁers:

We use two well-knowndatasets: (i)

MNIST [11] which consists of × pixelhandwritten digits, and includes K training and K testexamples; (ii)

CIFAR-10 [12] with K training and K testexamples of × color images of classes. We usedtwo classiﬁers, the normalized-distance weighted k -nearestneighbors (KNN) [13], and the more sophisticated Convolu-tional Neural Network (CNN) implemented with TensorFlow[14]. They output a vector with the probabilities that theobject belongs to each class. These classiﬁers have differentperformance and resource needs, hence allow us to builddiverse experiments. The predictors are trained with labeledimages and the outputs of the local ( f n ) and cloudlet ( f ) clas-siﬁers. These are the independent variables in our regressionmodel that estimates φ nt (dependent variables). Recall that thelatter are calculated using (1), where we additionally use that w nt = d ( s nt ) if device n has given a wrong classiﬁcationand w nt = − d ( s nt ) if the cloudlet is mistaken.

3) Benchmarks:

We compare OnAlgo with two algorithms.The

Accuracy-Threshold Ofﬂoading (ATO) algorithm, where atask is ofﬂoaded when the conﬁdence of the local classiﬁer isbelow a threshold, without considering the resource consump-tion. And the

Resource-Consumption Ofﬂoading (RCO) algo-rithm, where a task is ofﬂoaded when there is enough energy,without considering the expected classiﬁcation improvement.

4) Limitations of Mobile Devices:

We used our testbedto verify that these small resource-footprint devices requirethe assistance of a cloudlet. Our ﬁndings are in line withprevious studies, e.g., [15]. The performance of a CNN modelincreases with the number of layers. We ﬁnd that, even with layers, a CNN trained for CIFAR has GB size and hencecannot be stored in the RPs (see Fig. 2a). Similar conclusions We used vanilla versions of the classiﬁers to facilitate observation of theresults. The memory footprint of NNs can be made smaller [10] but this mightaffect their performance. Our analysis is orthogonal to such interventions. hold for the KNN classiﬁer that needs to locally store alltraining samples. Clearly, despite the successful efforts toreduce the size of ML models by, e.g. using compression [10];the increasingly complex analytics and the small form-factorof devices will continue to raise the local versus cloudletexecution trade off.

5) Classiﬁer Assessment:

In Fig. 2b we see that the accu-racy (ratio of successful over total predictions) of the KNNclassiﬁer improves with the size K n of labeled data. Figure 2cpresents the accuracy gains for CNN as more hidden layersare added. The gains are higher (up to 20%) for the digitsthat are difﬁcult to recognize, e.g., and . Fig. 2d showsthe CNN performance on CIFAR, which is lower as this isa more complex dataset (colored images, etc.). Overall, wesee that the classiﬁer performance depends on the algorithm(KNN, CNN), the settings (datasets, layers), and the objects. B. Performance Evaluation1) Resource Availability Impact:

Fig. 3 shows the averageaccuracy and fraction of requests ofﬂoaded to the cloudletwith OnAlgo when we vary their power budget. As B n increases there are more opportunities to use the cloudlet (4-layer CNN) and obtain more accurate classiﬁcations than thelocal classiﬁer (1-layer CNN). Furthermore, Fig. 2(c-d) showthat MNIST is easier to classify and the gains of using abetter classiﬁer are smaller than with CIFAR. Hence, as B n increases in Fig. 3 the ratio of ofﬂoaded tasks increases at afaster pace with CIFAR than with MNIST.

2) Comparison with Benchmarks:

We compare OnAlgo toATO and RCO. No-ofﬂoading (NO) serves as a baseline forthese algorithms in Fig. 4. To ensure a realistic comparison,we set the rule for all algorithms that the cloudlet willnot serve any task if the computing capacity constraint isviolated. For RCO, the availability of energy is determinedby computing the running average consumption at each deviceduring the experiment. We employ two testbed scenarios, anda simulation with larger number of devices.

Scenario 1:

Low accuracy improvement; high resources .We set B n = 0 . mW and H = 2 GHz allowing the devicesto ofﬂoad many tasks, and the cloudlet to serve most of them;and used MNIST (has small improvement). We demonstratethe average accuracy and power consumption in Fig. 4a, wherewe see that OnAlgo outperforms both ATO and RCO by .Regarding power consumption, ATO achieves the best resultsince it gets high enough conﬁdence on its local classiﬁer(rarely ofﬂoads). RCO however, ofﬂoads almost every taskas it has enough resources and does not refrain even whenimprovement is low. The reason it achieves lower accuracythan onAlgo is that it does not ofﬂoad intelligently, and getsdenied when the computing constraint is violated. Scenario 2:

High accuracy improvement; low resources .We set B n = 0 . mW and H = 200 M Hz not allowing manyofﬂoadings and cloudlet classiﬁcations. We used the CIFARdataset which has a large performance difference between We have explicitly set a small power budget so as to highlight the impactof power constraints on the system performance; higher power budgets willstill be a bottleneck for higher task request rates or images of larger size. M od e l s i z e ( M B ) (a) Memory usage of CNN A cc u r a c y K n =5×10 K n =200 K n =50 (b) KNN on MNIST Class A cc u r acy (c) CNN on MNIST A i r p l a n e A u t o m ob il e B i r d C a t D ee r D ogF r og H o r se S h i pT r u ck A cc u r acy (d) CNN on CIFAR Fig. 2: CNN memory usage vs number of layers, and accuracy of MNIST and CIFAR-10 for KNN and CNN classiﬁers.

MNIST B n ( 10 -6 ) P e r ce n t a g e ( % ) AccuracyOffloadings

CIFAR B n ( 10 -6 ) P e r ce n t a g e ( % ) AccuracyOffloadings

Fig. 3: Average accuracy and outsourcing of OnAlgo fordifferent power budgets, on MNIST (left) and CIFAR (right).

OnAlgo ATO RCO NO0.50.60.70.80.91 A cc u r acy A v g . P o w e r ( m W ) (a) Scenario 1 OnAlgo ATO RCO NO00.10.20.30.40.5 A cc u r acy A v g . P o w e r ( m W ) (b) Scenario 2 Fig. 4: Performance comparison of the ofﬂoading algorithms.local and cloudlet classiﬁers. We see from Fig. 4b that On-Algo achieves 28%-32% higher accuracy than both competingalgorithms. RCO is constrained to very few ofﬂoadings dueto the limited power budget, while ATO is resource-obliviousand ofﬂoads tasks regardless of the cloudlet’s capacity. Thisresults in many denied ofﬂoadings that reduce ATO’s accuracyand unnecessarily increase the power consumption. OnAlgoconsumes 60% less power than ATO as it frequently ofﬂoadsits low-conﬁdence tasks.

Scenario 3:

Large number of users . Finally, we simulatedthe algorithms for a large number of users while using theexperimentally measured parameters. We observe in Fig. 5athat the accuracy gradually drops (for all algorithms) sincenow a smaller percentage of the tasks can be served by thecloudlet. OnAlgo constantly outperforms both ATO and RCOby about − since it adapts to the available resources.This is more evident in Fig. 5b that shows the fast-increasingenergy cost of the two benchmark algorithms, as they eitherofﬂoad tasks that do not improve the performance, or ofﬂoadtasks while the cloudlet is already congested (these tasks aredropped and energy is wasted). Power consumption of OnAlgois up to less than that of RCO. A cc u r acy OnAlgoATORCO (a) Accuracy Comparison A v g . P o w e r ( m W ) OnAlgoATORCO (b) Power Cost Comparison

Fig. 5: Simulation results for increasing number of users onthe CIFAR dataset. B n = 0 . mW, H = 2 GHz . Slot -0.500.51 (a) Optimality gap

Slot (b) Constraint violation

Fig. 6: The convergence properties of OnAlgo when M =6 , N = 5 . Note that constraints are eventually satisﬁed, withsome of them in a strict fashion (hence the norm is not zero).

3) Convergence of OnAlgo:

Fig. 6 presents the conver-gence of OnAlgo for different step sizes α . Based on thesystem parameters the bound given by Theorem 1 is approx-imately 0.01, 0.2 and 1 for the three α values of Fig. 6.These are satisﬁed by the solution of OnAlgo in less than300 iterations as observed in Fig. 6a. The convergence isfaster for larger α , which however is achieved at the cost ofsmaller convergence accuracy. The constraint violation boundis also respected as shown in Fig. 6b with the constraintsbeing violated more often for small α in the beginning, butimproving as T increases.V. R ELATED W ORK

Edge & Distributed Computing . Most solutions parti-tion compute-intense mobile applications and ofﬂoad themto the cloud [16]; a solution that is unﬁt to enable low-latency applications. Cloudlets on the other hand, achievelower delay [4] but have limited serving capacity, hence6here is a need for an intelligent ofﬂoading strategy that wepropose here. Previous works consider simple performancecriteria, such as reducing computation loads [17], or powerconsumption [18] and focus on the architecture design. Also,Mobistreams [19] and Swing [20] focus on collaborativedata stream computations. The above systems either do notoptimize the ofﬂoading policy, or use heuristics that do notcater for task accuracy.

Mobile and IoT Analytics . The importance of analytics hasmotivated the design of wireless systems that can execute suchtasks. For instance, [21], [22] tailor deep neural networks forexecution in mobile devices, while [23] and [24] minimize theexecution time for known system parameters and task loads.Finally, [25]–[27] leverage the edge architecture to effectivelyexecute analytics for IoT devices. The plethora of such systemproposals, underlines the necessity for our online decisionframework that provides optimal execution of analytics.

Optimization of Analytics . Prior works in computationofﬂoading focus on different metrics such as number of servedrequests, [5], [28], and hence are not applicable here. In ourprevious work [29], we proposed a static collaborative opti-mization framework, which does not employ predictions noraccounts for computation constraints. Other works, e.g. [22]either rely on heuristics or assume static systems and knownrequests. Clearly, these assumptions are invalid for manypractical cases where system parameters not only vary withtime, but often do not follow i.i.d. processes. This renders theapplication of max-weight type of policies [9] inefﬁcient. Ourapproach is fundamentally different and leads to an onlinerobust algorithm and is inspired by dual averaging and primalrecovery algorithms for static problems, see [6].

Improvement of ML Models . Clearly, despite the effortsto improve the execution of analytics at small devices, e.g., byresidual learning or compression [10], the trade off betweenlocal low-accuracy and cloudlet high-accuracy execution isstill important due to the increasing number and complexity ofthese tasks. This observation has spurred efforts for designingfast multi-tier (cloud to edge) deep neural networks [15]and for dynamic model selection [30], among others. Theseworks are orthogonal to our approach and can be directlyincorporated in our framework.VI. C

ONCLUSIONS

We propose the idea of improving the execution of dataanalytics at IoT devices with more robust instances runningat cloudlets. The key feature of our proposal is a dynamicand distributed algorithm that makes the outsourcing decisionsbased on the expected performance improvement, and theavailable resources at the devices and cloudlet. The proposedalgorithm achieves near-optimal performance in a determinis-tic fashion, and under minimal assumptions about the systembehavior. This makes it ideal for the problem at hand where,the stochastic effects (e.g., expected accuracy gains) haveunknown mean values and possibly non-i.i.d. behavior.A

CKNOWLEDGMENTS

This publication has emanated from research supported inpart by SFI research grants 17/CDA/4760, 16/IA/4610 and is co-funded under the European Regional Development Fundunder Grant Number 13/RC/2077.R

EFERENCES[1] E. Siow, T. Tiropanis, and W. Hall, “Analytics for the internet of things:A survey,”

ACM Comput. Surv. , vol. 51, no. 4, pp. 74:1–74:36, 2018.[2] C. Jiang et al. , “Machine learning paradigms for next-generation wire-less networks,”

IEEE Wireless Comm. , vol. 24, no. 2, pp. 98–105, 2017.[3] Cisco White Paper, “Cisco global cloud index: Forecast and methodol-ogy, document id:1513879861264127,” 2018.[4] M. Satyanarayanan et al. , “The case for vm-based cloudlets in mobilecomputing,”

IEEE Pervasive Computing , vol. 8, no. 4, pp. 14–23, 2009.[5] Y. Mao et al. , “A survey on mobile edge computing: The communicationperspective,”

IEEE Comm. Surv. Tut. , vol. 19, no. 4, pp. 2322–2358,2017.[6] A. Nedi´c and A. Ozdaglar, “Approximate primal solutions and rateanalysis for dual subgradient methods,”

SIAM J. on Optimization ,vol. 19, no. 4, pp. 1757–1780, 2009.[7] A. Galanopoulos et al. , “Improving iot analyticsthrough selective edge execution: Appendix,” 2019,https://1drv.ms/b/s!AoI5lEO8XUP1iQIjf1w0YeaUCa83?e=9IW474.[8] A. Gelman and J. Hill, ”Data Analysis Using Regression and Multi-level/Hierarchical Models” . Cambridge University Press, 2007.[9] L. Georgiadis, M. J. Neely, and L. Tassiulas, “Resource allocation andcross-layer control in wireless networks,”

Found. Trends Netw. , vol. 1,no. 1, pp. 1–144, 2006.[10] V. Chandrasekhar et al. , “Compression of deep neural networks forimage instance retrieval,” in

Proc. of DCC , 2017.[11] Y. Lecun et al. , “Gradient-based learning applied to document recogni-tion,”

Proc. of the IEEE , vol. 86, no. 11, pp. 2278–2324, 1998.[12] A. Krizhevsky, “Learning multiple layers of features from tiny images,”Tech. Rep., 2009.[13] S. A. Dudani, “The distance-weighted k-nearest-neighbor rule,”

IEEETrans. on Sys., Man, and Cybern. , vol. 6, no. 4, pp. 325–327, 1976.[14] M. Abadi et al. , “Tensorﬂow: A system for large-scale machine learn-ing,” in

Proc. of USENIX OSDI , 2016.[15] S. Teerapittayanon et al. , “Distributed deep neural networks over thecloud, the edge and end devices,” in

Proc. of IEEE ICDCS , 2017.[16] B.-G. Chun et al. , “Clonecloud: Elastic execution between mobiledevice and cloud,” in

Proc. of EuroSys , 2011.[17] A. Dou et al. , “Misco: A mapreduce framework for mobile systems,”in

Proc. of PETRA , 2010.[18] X. Lyu et al. , “Selective ofﬂoading in mobile edge computing for thegreen internet of things,”

IEEE Network , vol. 32, no. 1, pp. 54–60, 2018.[19] H. Wang and L. Peh, “Mobistreams: A reliable distributed streamprocessing system for mobile devices,” in

Proc. of IEEE IPDPS , 2014.[20] S. Fan, T. Salonidis, and B. Lee, “Swing: Swarm computing for mobilesensing,” in

Proc. of IEEE ICDCS , 2018.[21] X. Ran et al. , “Delivering deep learning to mobile devices via ofﬂoad-ing,” in

Proc. of VR/AR Network Workshop , 2017.[22] X. Ran et al. , “Deepdecision: A mobile deep learning framework foredge video analytics,” in

Proc. of IEEE INFOCOM , 2018.[23] Y. Li et al. , “Mobiqor: Pushing the envelope of mobile edge computingvia quality-of-result optimization,” in

Proc. of IEEE ICDCS , 2017.[24] W. Zhang et al. , “Hetero-edge: Orchestration of real-time vision appli-cations on heterogeneous edge clouds,” in

Proc. of IEEE INFOCOM ,2019.[25] G. Li et al. , “Data analytics for fog computing by distributed onlinelearning with asynchronous update,” in

Proc. of IEEE ICC , 2019.[26] S. K. Sharma and X. Wang, “Live data analytics with collaborative edgeand cloud processing in wireless iot networks,”

IEEE Access , vol. 5, pp.4621–4635, 2017.[27] J. He et al. , “Multitier fog computing with large-scale iot data analyticsfor smart cities,”

IEEE Internet of Things Journal , vol. 5, no. 2, pp.677–686, 2018.[28] X. Chen et al. , “Efﬁcient multi-user computation ofﬂoading for mobile-edge cloud computing,”

IEEE/ACM Trans. on Networking , vol. 24, no. 5,pp. 2795–2808, 2016.[29] A. Galanopoulos, G. Iosiﬁdis, and T. Salonidis, “Optimizing dataanalytics in energy constrained iot networks,” in

Proc. of WiOpt , 2018.[30] L. Liu and J. Deng, “Dynamic deep neural networks:Optimizing accuracy-efﬁciency trade-offs by selective execution,” arXiv:1701.00299 , 2017., 2017.