[PDF] Semantic Compression for Edge-Assisted Systems

Abstract

A novel semantic approach to data selection and compression is presented for the dynamic adaptation of IoT data processing and transmission within "wireless islands", where a set of sensing devices (sensors) are interconnected through one-hop wireless links to a computational resource via a local access point. The core of the proposed technique is a cooperative framework where local classifiers at the mobile nodes are dynamically crafted and updated based on the current state of the observed system, the global processing objective and the characteristics of the sensors and data streams. The edge processor plays a key role by establishing a link between content and operations within the distributed system. The local classifiers are designed to filter the data streams and provide only the needed information to the global classifier at the edge processor, thus minimizing bandwidth usage. However, the better the accuracy of these local classifiers, the larger the energy necessary to run them at the individual sensors. A formulation of the optimization problem for the dynamic construction of the classifiers under bandwidth and energy constraints is proposed and demonstrated on a synthetic example.

Full PDF

SSemantic Compression for Edge-Assisted Systems

Igor Burago, Marco Levorato, and Sameer Singh

Department of Computer ScienceUniversity of California, IrvineEmail: {iburago, levorato, sameer}@uci.edu

Abstract —A novel semantic approach to data selection andcompression is presented for the dynamic adaptation of IoT dataprocessing and transmission within “wireless islands”, where a setof sensing devices (sensors) are interconnected through one-hopwireless links to a computational resource via a local access point.The core of the proposed technique is a cooperative frameworkwhere local classiﬁers at the mobile nodes are dynamically craftedand updated based on the current state of the observed system,the global processing objective and the characteristics of thesensors and data streams. The edge processor plays a key roleby establishing a link between content and operations withinthe distributed system. The local classiﬁers are designed toﬁlter the data streams and provide only the needed informationto the global classiﬁer at the edge processor, thus minimizingbandwidth usage. However, the better the accuracy of these localclassiﬁers, the larger the energy necessary to run them at theindividual sensors. A formulation of the optimization problem forthe dynamic construction of the classiﬁers under bandwidth andenergy constraints is proposed and demonstrated on a syntheticexample.

I. I

NTRODUCTION

The Internet of Things (IoT) paradigm [1] envisions ascenario where machines remotely interact to provide servicesand perform monitoring and control tasks. To this aim, theIoT realizes a network of data sources, mobile devices, andprocessing centers interconnected through wireless and wirelinelinks, where local and global algorithms cooperate in adistributed fashion.Sophisticated large-scale application scenarios such as SmartCity systems [2] and intelligent (or autonomous) vehicularnetworks [3], [4] push the limits of IoT systems in sensing,communication and processing capabilities. To address the needfor tight control loops, timely coordination and computation-intense processing, Fog and Edge Computing architectures [5],[6] place computation resources at the edge of the wirelessaccess infrastructure. In these architectures, mobile devicescan ofﬂoad computational tasks to edge data processorsthrough one-hop low-latency links. The co-location of sensingand processing within a star topology allows reliable localcoordination of remote devices informed by global resources,such as databases and data centers in the cloud. However,the limited and time-varying bandwidth available in wirelessenvironments makes the design of edge-based architectureschallenging. This especially applies in those scenarios whereIoT data streams coexist with other services on the samechannel and network resource.In this paper, we propose a framework for the dynamicadaptation of IoT data processing and transmission within

Access Point

Edge

Processor

SensorsEnvironment ' ' ± ' , ÷ p . . . n*⇐¥¥¥⇐¥⇒ p :* Dr If to ^ ^ ^^ ^ ^ .ie ¥¥ .it ¥¥¥hthM ¥iE*E*¥ .IE?ItEEII . an Figure 1.

Edge-assisted local network scenario:

A set of sensing devicesacquire observations on the physical environment to support a globalcomputational task performed at the edge processor. “wireless islands”, where a set of sensing devices (sensors) areinterconnected with one-hop wireless links to a computationalresource through a local access point ( e.g. , a cellular basestation or a Wi-Fi access point). We speciﬁcally addressan application scenario where the sensors and the edgeprocessor cooperatively perform a real-time data acquisitionand processing task, such as classiﬁcation or detection basedon environmental observations (see Fig. 1). The challenge, then,is to accomplish such task with the bandwidth, computationalpower and energy constraints imposed by the limited resourcesavailable at the device and network levels.The core of the framework is a novel “semantic” approachto data selection and compression, where local classiﬁers atthe mobile nodes are dynamically crafted and updated based a r X i v : . [ c s . N I] F e b n the current state of the observed system and its processingobjective, together forming a continuously evolving context .The edge processor plays a key role by establishing a linkbetween content and operations within the distributed system.The local classiﬁers are designed to ﬁlter the data streams andprovide only the needed information to the global classiﬁer atthe edge processor, thus minimizing bandwidth usage. However,the better the accuracy of these local classiﬁers, the larger theenergy necessary to run them at the individual sensors. Ourframework builds on recent results [7], [8], where classiﬁersimpliﬁcations are applied to the problem of explaining theoutcome of black box machine learning algorithms.An interesting connection can be made to the traditionalmultimedia compression techniques, where the componentsimperceivable by humans are removed. Thus, distortion of theoriginal signal is accepted in those regions that are not neededby the ﬁnal application. This research extends this principle todata consumed by machines for general computational purposes.Additionally, we expand the traditional focus on bandwidthcompression by itself with the notion of energy-awareness.The rest of the paper is organized as follows. Section IIintroduces the general scenario and describes the problemaddressed herein. In Section III, we present the semanticcompression framework, and illustrate its key componentson an examplary problem in Section IV. Section V concludesthe paper. II. P ROBLEM F ORMULATION

Recent advances in machine learning resulted in sophisticatedmodels, which provide incredibly capable detectors of interestto IoT applications, particularly for image and video processing.Instead of working only for niche or synthetic settings, theseclassiﬁers are able to handle real-world input from a large vari-ety of environments. As a consequence, the resulting classiﬁersoften tend to be too complex in structure, and can only resideon devices capable of handling computationally-intense tasks.However, mobile sensors collecting the data for processing haveonly limited observational power, computational capabilities,and energy availability. Hence, due to constraints in theseresources, they often cannot support such complex classiﬁers.Fog and Edge architectures offer a solution to this issue byintroducing computational resources within the local wirelessisland. However, bandwidth constraints, often imposed by othercompeting services, limit the data that can be transferredfrom the sensors to the computational resources. In thesecircumstances, pre-ﬁltering the data at the sensors becomesnecessary to avoid delay, data loss, or undesirable disruptionof other wireless services.A sketch of the architecture at the center of our studies is inFig. 1, where a set of sensors acquire observations in some dy-namic environment. The sensors are wirelessly interconnectedthrough a local access point ( e.g. , base station) to an edgeprocessor. The edge processor is assigned a computational task(possibly changing in time), such as the identiﬁcation of humanactivities in public parks or trafﬁc dangers in autonomousvehicles’ networks. This task corresponds to one or more classiﬁers taking the data streams from the sensing devices astheir inputs. The goal of the global classiﬁers is to achieve anaverage accuracy α , measured in terms of classiﬁcation errors.For the sake of explanation ease, we introduce the notion oftemporal period, where time is discretized and indexed with t .The K sensors are connected to the edge processor throughwireless links of capacity b k,t , k = 1 , . . . , K , in the period t .A constraint b t on the overall capacity available to the sensors,where (cid:80) Kk =1 b k,t ≤ b t , can be introduced to capture channelsharing. The signal acquired by a sensor k in the time period t is X k,t . Each sensor has an energy storage for processingand transmission, where the amount of energy available atsensor k in period t is equal to e k,t . The energy storage can bereﬁlled through charging or energy harvesting, modeled as arandom arrival process. The goal of the system is to guaranteethe wanted accuracy at the edge processor using the availablebandwidth and energy. Fig. 2 illustrates the components of thesystem for an individual sensor k .The sensors implement local classiﬁers which serve thepurpose of ﬁltering out unusable data , deﬁned as the datathat are not needed for maintaining the target accuracy at theedge processor. While the amount of data transferred from thesensors to the edge is bounded by the time-varying capacity ofthe channel, the efﬁcacy in locally removing unnecessary datais bounded by the processing power and energy availability atthe sensors. On the one hand, the transmission of unﬁltereddata may violate the bandwidth constraint, thus causing dataloss and disruption of existing wireless services. On the otherhand, running a complex local classiﬁer may require excessivecomputational effort and energy expense to the mobile devices.We formulate an optimization problem capturing the tensionbetween these two extremes for the purposes of dynamicadaptation of ﬁlters deployed at the sensors. Based on the inputfrom the sensors, the edge processor periodically produces anew ﬁlter with controlled complexity for each sensor, basedon bandwidth and energy usage constraints following fromhigh-level operational objectives. Herein, we focus on buildingcustomized classiﬁers possessing the following characteristics: • Locality.

The sensor-speciﬁc classiﬁers will be trained toachieve a certain accuracy level for the kinds of inputsthe sensor is likely to receive. For instance, the localclassiﬁers will be built to provide low-error predictionsfor indoor images if the sensor is placed inside. • Bandwidth-Awareness.

The local classiﬁers are designedto be used as bandwidth-preserving ﬁlters, thus optimizingfor the false-negative rate to meet the bandwidth con-straints imposed by the link to the global edge processor. • Complexity and Energy-Awareness.

The design ofthe local classiﬁers will satisfy complexity and energyrequirements of the sensor as determined by a stochasticenergy-arrival process.Given the complex, accurate classiﬁer at the edge, ourobjective is to build a sensor-speciﬁc classiﬁer tailored tothe distribution of samples in the current period, and satisfyingthe bounded complexity and bandwidth usage. More formally,we are provided with a pre-trained binary classiﬁer, e.g. , one dge Processor

Observe Filter Transmit Fit classiﬁer to sensor locality

Sensor k . * ' ' ¥÷ , " if " " Fithian

III f¥t⇒o* " Figure 2.

Illustration of the problem:

Dynamic energy- and bandwidth-aware adaptation of local data ﬁltering serving the purpose of global estimation. Theﬁgure illustrates the components of the acquisition, communication, processing and control for one sensor. detecting whether a person is visible by the sensor, denoted by f : X → { , } , where X is the space of possible inputs. Wetreat this classiﬁer as a black-box function in order to supportas wide of a variety of machine learning algorithms as possible.For a sensor k during period t , the goal is to identify a local classiﬁer g k,t : X → { , } , g k,t ∈ G , that meets thespeciﬁcations of the sensor, where G is the family of machinelearning classiﬁers we want the sensor to use (for instance,linear classiﬁers). In particular, we are provided with thefollowing requirements corresponding to the aforementionedcharacteristics: • Locality D k,t : The expected distribution of the sensorinputs for period t is denoted by D k,t . We want g k,t to be as accurate as f as possible on inputs from thisdistribution. • Bandwidth b k,t : The average amount of data allowed tobe transmitted by g k,t for the period t should be less than b k,t . (It is also possible to consider a generalization whereonly the total capacity b t for all sensors is provided.) • Energy e k,t : Average energy used by g k,t for the period t should be less than e k,t .In this work we assume that the customized classiﬁer g k,t willbe built on the edge, not the sensor, and thus the computationalefﬁciency of estimating g k,t is not restricted.III. S EMANTIC C OMPRESSION

In this section, we outline our proposed approach to con-structing a classiﬁer g k,t that meets the sensor’s requirementson energy, bandwidth, and locality for the period t , while stillbeing faithful to the complex, global classiﬁer f . Energy Efﬁciency.

The primary obstacle with using f at thesensor level is its computational complexity. For insance, eachprediction by a neural network can often take hundreds tothousands of ﬂoating-point computations, resulting in a heavypower consumption. Instead, we are concerned with learningan energy-efﬁcient classiﬁer g k,t ∈ G , for G being limitedto a simpler model family, such as SVMs, decision trees,linear classiﬁers, etc . We deﬁne the energy consumed by g k,t for an input as E g k,t : X → R ≥ ; the average energy usedby the sensor k for period t will be E x ∼D k,t [ E g k,t ( x )] . Wealso deﬁne a penalty on the classiﬁer for violating an energyconstraint e k,t as R E , such that R E ( E g k,t ( x ) , e k,t ) = 0 if g k,t meets the energy requirement e k,t , and R E ( E g k,t ( x ) , e k,t ) > otherwise. Since directly estimating the energy consumption E g k,t of a classiﬁer g k,t is challenging, we use the number ofcomputational operations as a proxy, and thus R E penalizes g k,t the more operations it requires for a prediction. Locality.

Obviously, an energy-efﬁcient classiﬁer g k,t , byusing a simpler structure, cannot have the same generalrepresentation capabilities as the global classiﬁer f for thecomplete range of inputs. However, in any given time period,most sensors do not receive the full variety of inputs that theglobal classiﬁer is designed to support, and thus it is possibleto have g k,t focus its representation on the inputs expectedat the sensor. In order to identify such a g k,t , we use theexpected distribution of inputs, D k,t , to compute how similar g k,t is to f . In particular, given a loss function L ( f ( x ) , g k,t ( x )) between g k,t ’s and f ’s predictions on an instance x , e.g. , thesquared loss L sq ( a, b ) = ( a − b ) or the logistic loss L ll ( a, b ) = − a log b − (1 − a ) log(1 − b ) , we evaluate the similarity between g k,t and f as E x ∼D k,t (cid:2) L ( f ( x ) , g k,t ( x )) (cid:3) . Fig. 3 illustrates theintuition, where a complex, power-consuming global classiﬁer f (solid gray curve) can be approximated quite well locally by asimple, and thus energy-efﬁcient, classiﬁer g k,t (dashed boldline). Bandwidth Awareness.

Every automated detector is accom-panied by a certain level of expected error, often measuredas the rate of false positives and false negatives. Due to theenergy constraints on the desired classiﬁer g k,t , it may notbe able to maintain the same low error levels as the globalclassiﬁer f , even on the local distribution of inputs. In suchsituations, we can treat g k,t as the sensor-level ﬁltering ofthe inputs, with f running at the edge level to achieve thesame low error levels. Thus there is a trade-off between howmuch the bandwidth is used to transmit false positives versusmissing out a relevant input in order to conserve the bandwidth.We deﬁne the amount of data g k,t will use for an input x as B g k,t : X → R ≥ ; the average data transmitted by the sensorfor period t will be E x ∼D k,t [ B g k,t ( x )] . We further deﬁne thepenalty on the classiﬁer g k,t for violating the bandwidth b k,t as R B , such that R B ( B g k,t ( x ) , b k,t ) = 0 if g k,t uses less than b k,t bandwidth, and R B ( B g k,t ( x ) , b k,t ) > otherwise. Fig. 3shows an example where a classiﬁer that is not aware of itsuse as a ﬁlter (the leftmost example) may transmit less butave a high error rate, while a bandwidth-aware classiﬁer (inthe middle) will obtain lower false negative rate. Semantic Compression.

From the sensor speciﬁcations,namely local distribution D k,t , energy consumption constraint e k,t and penalty function R E , bandwidth constraint b k,t andpenalty function R B , and the global classiﬁer f , we can framethe search for the sensor-speciﬁc classiﬁer g k,t as the followingoptimization problem to be solved periodically over time: g ∗ k,t = arg min g k,t ∈G E x ∼D k,t (cid:2) L ( f ( x ) , g k,t ( x )) (cid:3) , (1)s.t. E x ∼D k,t (cid:2) R E (cid:0) E g k,t ( x ) , e k,t (cid:1)(cid:3) ≤ ε k,t , (2) E x ∼D k,t (cid:2) R B (cid:0) B k,t ( x ) , b k,t (cid:1)(cid:3) ≤ β k,t . (3)Here ε k,t and β k,t have the meaning of the tolerances onthe expected penalties R E and R B for random observationsfollowing a given locality distribution D k,t .The distribution D k,t serves a proxy role conveying to theedge processor a local description of expected observations atthe sensor, without wasting the bandwidth for transmitting theobservations themselves. The edge processor, in turn, replies tothe sensor with a classiﬁer g ∗ k,t , locally tuned to D k,t accordingto the problem in Eq. (1)–(3). For each particular sensor andtime period, the distribution D k,t is ﬁxed, so the efﬁcacy ofthis semantic compression scheme is determined by whetherthe family of local classiﬁers G is ﬂexible enough for thedistribution of positive and negative samples in D k,t . However,at a larger scope, the locality D k,t may vary and is subject tonegotiation between the sensor and the edge processor.With the shape of the locality D k,t controllable, the qualityof corresponding classiﬁers g k,t may be improved additionallythrough locality tuning. This brings the option to view theoptimization in Eq. (1)–(3) as a subproblem for a higher-levelcontrol task, maintaining a desired aptitude of classiﬁers g k,t on a sequence of observations generated by the sensor. Thisway, the problem of ﬁnding optimal local g k,t may be extendedto the broader adaptive-control problem of maintaining adesired accuracy of ﬁltration by adjusting the locality-capturingprocedure delivering distributions D k,t , such that E x ∼S k,t (cid:2) Q g k,t ( x, D k,t ) (cid:3) ≤ q k,t . (4)The penalty function Q g k,t stands for the losses we bearfrom any inadequacies of the local classiﬁer g ∗ k,t to theparticular choice of locality D k,t , which we would like to keepbounded by a tolerance q k,t . Here, the quality is monitored forinputs from some control distribution S k,t chosen by the edgeprocessor using the empirical data arriving from the sensorand the a priori strategic objectives for the ultimate outcomesof the sensor-edge system as a whole. In practice, S k,t maycoincide with the global observatory distribution X , the localitydistribution D k,t , or can be derived from the sequence ofempirical observations obtained by the sensor k . In Eq. (4),the locality D k,t is made an argument of the penalty Q g k,t to highlight its potential role as the control “variable”. Onesimple example giving an idea of how localities D k,t may beparametrized and controlled will be given in the followingsection. False negatives False positives

Less tra ﬃ c More tra ﬃ c Xz ^ DK ,t trip , WE x , > is ~÷ Figure 3.

Localized semantic classiﬁer compression:

The gray area depictsthe subspace of positive detections of a global black-box classiﬁer f in thespace of all inputs X . The dashed bold line represents a simpliﬁed linearclassiﬁer g k,t chosen to ﬁt f only in the locality D k,t of recent inputs froma sensor k (yellow circle), and so does not need to bear the full complexityof f . Our approach draws instances from D k,t , classiﬁes them with f , anduses the resulting sample for optimizing g k,t . Due to energy and bandwidthconstraints, different boundaries g k,t may be obtained as illustrated under theplot: Aggressive ones save trafﬁc by capturing less but risk frequent misses(left example); conservative ones avoid misses by capturing more but generatemore trafﬁc (right example). IV. S

IMULATION R ESULTS

In order to illustrate the feasibility of the proposed approach,let us consider a motivating example of a binary classiﬁcationproblem, in the context of a single sensor-edge pair (for thisreason we omit the index k below, for the sake of brevity).As customary, input observations subject to classiﬁcationcome as feature vectors in a multidimensional vector space X .The two classes correspond to the sets of observations that areto be registered by the sensor-edge system (positives), versusthe rest (negatives). In this case, probability distributions of bothclasses are set to be Gaussian mixtures (and so is, therefore,the joint distribution X ). Both mixtures consist of the samenumber of symmetric normally-distributed components centeredequidistantly on a number of lines parallel to the main diagonalof the unit hypercube.For simplicity, we assume that both f and g ∈ G belong tothe same class of Support Vector Machine classiﬁers (SVMs)working in the space X . To satisfy the requirement of g havinga lower complexity than f , the class G is limited to SVMs withlinear kernels, while the reference global classiﬁer f is trainedfor the kernel of Gaussian radial basis functions (and can bereplaced with even more computationally-intense classiﬁer).Each locality distribution D t guiding the selection of trainingsamples for the on-sensor classiﬁers g t is set to be a uniformdistribution in a sphere described by its center and radius r t . Byhe nature of the distribution X , the local and global accuracyof the classiﬁer f is expected to not differ signiﬁcantly, whilethe accuracy of localized classiﬁers g shall be sensitive to thelocalities D t and their sizes r t .In this circumstances, the applicability of the problemstatements given in Section II to this detection task requiresa study of two aspects of the system: (i) The accuracy of thelocalized classiﬁers g t for different spheres D t as a function ofradii r t and the update frequency /γ . (ii) Realization of actualdistributions of consecutive observations x t in the data for adesired update frequency, and the procedure for adaptivelychoosing the radii r t reacting to the accuracy-complexitytradeoff.To this end, both in this speciﬁc example and in general, weneed to be in possession of two samples. First, a labeled trainingdataset of pairs ( z i , y i ) is necessary, where points z i ∈ X are drawn from the joint distribution of observations X , and y i ∈ { , } signify the corresponding labels. We can assumethe availability of this sample Z without any loss of generality,as the very problem setting given in Section II starts with aclassiﬁer f that has to be trained on some sample, which wecan reuse here for Z . In the unsupervised case, for the purposesof the following discussion, labels y i can be deﬁned by theoutcomes f ( z i ) of the global classiﬁer f .Second, it is necessary to have a sample of one or moretrajectories S = ( x , . . . , x T ) , x t ∈ X representative of thesequential process generating observations on the sensor. Inpractice, this sample can be obtained from previous, nonadap-tive runs of the sensor-edge system in question, where allsensor observations eventually reach and get accumulated atthe edge processor. In this example problem, we assume thatthe trajectory distribution S follows the general distribution X (which would also likely be the case in general, as well, unlessthe nature of observation process dictates otherwise). Adheringto this assumption, we generate a sample S as a Markov chainstarting from a randomly chosen point x ∼ X and continuingby applying the Metropolis-Hastings sampling algorithm to thedistribution X .The two aforementioned aspects of the system, then, canbe studied through the following duplex sampling procedure(schematically depicted in Fig. 4).1) For each update frequency /γ (or, equivalently, thelength of the update period γ in the number of ob-servations), draw a sample of subsequences S t ( γ ) =( x t − γ +1 , . . . , x t ) of γ consecutive observations alongthe trajectory S .2) For each subsequence S t ( γ ) :a) Find the minimal sphere D t containing all (or agiven percentage of) points x t − γ +1 , . . . , x t .b) Sample points Z t ( γ ) = { ( z i , y i ) ∈ Z | z i ∼ D t } from the general training sample Z uniformly insideof the sphere D t .c) Using the points in Z t ( γ ) as a training sample, ﬁta classiﬁer g t ∈ G to a desired quality.d) Apply the classiﬁer g t to the points in the subse-quence S t ( γ ) , comparing the verdicts of g t to the (a) (c)(d) Training sampleTrajectory sample (b) . - : D= f II.If ?tIIe?fjEE:¥I¥ Figure 4.

Trajectory sampling procedure:

Schematic representation of thestages (a)–(d), highlighting the key variables invloved. corresponding verdicts of the reference classiﬁer f for those same points in S t ( γ ) .e) Store the radius r t ( γ ) of the sphere D t and the re-sulting accuracy α t ( γ ) of the localized classiﬁer g t on the points in S t ( γ ) .With the accumulated statistics of radii r t ( γ ) and accura-cies α t ( γ ) , it is then possible for us to compute the empiricalaverages of both of these features over trajectory’s subsequencesas functions of update period γ .Fig. 5 and 6 demonstrate these functional relations inthe case of our motivating example for a multidimensionalGaussian sample. The former ﬁgure depicts the average radiusof spheres containing of the points in subsequences S t ( γ ) for different values of γ . As we can see, the average radiusquickly grows as the update period increases. The latterﬁgure highlights the opposite trend: the accuracy of locally-ﬁtclassiﬁers g t almost monotonically decreases with increasingperiod of updates. For comparison, the accuracy of the globalclassiﬁer f when it is implemented as an RBF-kernel SVMﬂuctuates insigniﬁcantly around independent of the updatefrequency γ .Here both f and g t were trained to treat both false positiveand false negatives equally; in cases where it is intolerableto miss detections due to localized approximation, the sametrends will be present for respectively adjusted g t . The choiceof update frequency can be guided by the penalty taken by theaccuracy α t ( γ ) when the classiﬁer g t trained for a locality D t is kept for a use in the subsequent localities D t +1 , D t +2 , . . . without an update. For our example this relation is summarizedin Fig. 7 showing the change in mean accuracy of a localclassiﬁer g t as a function of the delay between its training andits usage. The x-axis measures the delay relative to the updateperiod length γ . The y-axis measures the ratio between themean accuracy for the trajectory subsequence corresponding tothe moment a local classiﬁer g t is used and the mean accuracy . . . .

81 Update period γ M e a n r a d i u s r Figure 5.

Update locality:

Average radius of a sphere containing ofpoints in trajectory subsequences, as a function of update period for theexample problem. The range between 0.25- and 0.75-quantiles is highlighedin gray.

50 250 500 750 1000 1250 1500 1750 20000 . . . . . .

91 Update period γ M e a n a cc u r a c y α Figure 6.

Local accuracy:

Average accuracy as a function of update periodfor the example problem. The range between 0.25- and 0.75-quantiles ishighlighed in gray. for the trajectory subseqence corresponding to the momentlocality D t was captured.All three of these relations conﬁrm the feasibility of theassumptions underlying the problem formulation, that, whilesimpler local classiﬁers g t have poor accuracy globally, theirquality catches up for frequent locality updates to a satisfactorylevel comparable to that of the global classiﬁer f . The ultimatequality of the resulting system will, of course, depend signiﬁ-cantly on the mutual compatibility of the data distribution X (governing the complexity of the global classiﬁer f ), the familyof local classiﬁers G , the form of locality distributions D t , andthe constraints on the desired accuracy. For instance, whensensor sampling trajectories do not exhibit enough compactnessas measured by the form D t and g t , it might be problematic oreven impossible to achieve very high levels of accuracy withthe localized substitution classiﬁers g t . In each particular case,the limits of the achievable results should be studied separately,e.g., using the above trajectory-sampling procedure. . . . . . . . . . . γ = 50 γ = 100 γ = 200 γ = 300 γ = 400 γ = 500 γ = 600 Update delay (in γ ) M e a n a cc u r a c y r a t i o Figure 7.

Local classiﬁer aging:

Relative change in average accuracy of alocal classiﬁer g t for different update periods γ as a function of update delaynormalized by the length of update period. For problems where, like in our example here, the localityof the space X can be exploited well for a given D t and g t , it opens the possibility for an efﬁcient adaptation of thelocality D t ( τ ) as a function of some control parameters τ .For instance, here the update period γ can serve the roleof the parameter τ , with the control objective consisting inkeeping it smaller than some γ guaranteeing a desired accuracy(according to Fig. 6).V. C ONCLUSIONS

Sophisticated IoT systems often involve combining sensing,communication, and processing capabilities. Recent architec-tures for such IoT systems often perform expensive computationat the edge-level, in order for the mobile devices to utilizetheir limited energy for sensing and transmission. However,such architectures often cannot meet the tight constraints of atime-varying or limited bandwidth availability, as is commonin real world applications, due to their need to communicateall of the data from the sensor-level devices to the edge.In this paper, we proposed an alternative architecture wherethe edge and the devices perform the computation cooperatively.The core of our proposed approach is to provide a “semantic”strategy for carrying out this sharing of the computation: wedynamically craft customized classiﬁers for each sensor thatdeﬁne what the sensor device will communicate to the edgeprocessor, thus ofﬂoading majority of the computation to thesedevices. This proposed design of sensor-speciﬁc classiﬁerstakes into account the various properties of the current contextsuch as the sensor-speciﬁc distribution of inputs that the deviceis likely to observe, the energy resources and constraints onthe device, and the time-varying limitations on the sharedbandwidth to the edge.We showed the feasibility of our semantic approach usingsimulated experiments. We demonstrated that simple, energy-efﬁcient classiﬁers can be as accurate in classiﬁcation ascomplex classiﬁers if we utilize the distribution inputs that thesensing device is likely to receive when constructing them. Weurther showed that the approach is fairly robust to changes inthis distribution of inputs over time. Although the classiﬁersneed to be updated as the current context of the sensors andthe edge changes over time, we also demonstrated that thesensor-speciﬁc classiﬁers still maintain accuracy even if theyare not updated very frequently. With these encouraging results,we are interested in future to deploy such an architecture toreal-world IoT testbeds.R

EFERENCES[1] L. Atzori, A. Iera, and G. Morabito, “The Internet of Things: A survey,”

Computer networks , vol. 54, no. 15, pp. 2787–2805, 2010.[2] P. Neirotti, A. D. Marco, A. Cagliano, G. Mangano, and F. Scorrano,“Current trends in smart city initiatives: Some stylised facts,”

Cities , vol. 38,pp. 25–36, 2014.[3] C. T. Barba, M. A. Mateos, P. R. Soto, A. M. Mezher, and M. A.Igartua, “Smart city for vanets using warning messages, trafﬁc statisticsand intelligent trafﬁc lights,” in

Intelligent Vehicles Symposium (IV), 2012IEEE . IEEE, 2012, pp. 902–907.[4] F. J. Martinez, C.-K. Toh, J.-C. Cano, C. T. Calafate, and P. Manzoni,“Emergency services in future intelligent transportation systems basedon vehicular communication networks,”

IEEE Intelligent TransportationSystems Magazine , vol. 2, no. 2, pp. 6–20, 2010.[5] F. Bonomi, R. Milito, J. Zhu, and S. Addepalli, “Fog computing and itsrole in the internet of things,” in

Proceedings of the First Edition of theMCC Workshop on Mobile Cloud Computing , ser. MCC ’12, 2012, pp.13–16.[6] M. Satyanarayanan, P. Simoens, Y. Xiao, P. Pillai, Z. Chen, K. Ha, W. Hu,and B. Amos, “Edge analytics in the internet of things,”

IEEE PervasiveComputing , vol. 14, no. 2, pp. 24–31, 2015.[7] M. T. Ribeiro, S. Singh, and C. Guestrin, “"Why should I trust you?":Explaining the predictions of any classiﬁer,” in

Knowledge Discovery andData Mining (KDD) , 2016.[8] ——, “Model-agnostic interpretability of machine learning,” in