Semantic Compression for Edge-Assisted Systems
SSemantic Compression for Edge-Assisted Systems
Igor Burago, Marco Levorato, and Sameer Singh
Department of Computer ScienceUniversity of California, IrvineEmail: {iburago, levorato, sameer}@uci.edu
Abstract —A novel semantic approach to data selection andcompression is presented for the dynamic adaptation of IoT dataprocessing and transmission within “wireless islands”, where a setof sensing devices (sensors) are interconnected through one-hopwireless links to a computational resource via a local access point.The core of the proposed technique is a cooperative frameworkwhere local classifiers at the mobile nodes are dynamically craftedand updated based on the current state of the observed system,the global processing objective and the characteristics of thesensors and data streams. The edge processor plays a key roleby establishing a link between content and operations withinthe distributed system. The local classifiers are designed tofilter the data streams and provide only the needed informationto the global classifier at the edge processor, thus minimizingbandwidth usage. However, the better the accuracy of these localclassifiers, the larger the energy necessary to run them at theindividual sensors. A formulation of the optimization problem forthe dynamic construction of the classifiers under bandwidth andenergy constraints is proposed and demonstrated on a syntheticexample.
I. I
NTRODUCTION
The Internet of Things (IoT) paradigm [1] envisions ascenario where machines remotely interact to provide servicesand perform monitoring and control tasks. To this aim, theIoT realizes a network of data sources, mobile devices, andprocessing centers interconnected through wireless and wirelinelinks, where local and global algorithms cooperate in adistributed fashion.Sophisticated large-scale application scenarios such as SmartCity systems [2] and intelligent (or autonomous) vehicularnetworks [3], [4] push the limits of IoT systems in sensing,communication and processing capabilities. To address the needfor tight control loops, timely coordination and computation-intense processing, Fog and Edge Computing architectures [5],[6] place computation resources at the edge of the wirelessaccess infrastructure. In these architectures, mobile devicescan offload computational tasks to edge data processorsthrough one-hop low-latency links. The co-location of sensingand processing within a star topology allows reliable localcoordination of remote devices informed by global resources,such as databases and data centers in the cloud. However,the limited and time-varying bandwidth available in wirelessenvironments makes the design of edge-based architectureschallenging. This especially applies in those scenarios whereIoT data streams coexist with other services on the samechannel and network resource.In this paper, we propose a framework for the dynamicadaptation of IoT data processing and transmission within
Access Point
Edge
Processor
SensorsEnvironment ' ' ± ' , ÷ p . . . n*⇐¥¥¥⇐¥⇒ p :* Dr If to ^ ^ ^^ ^ ^ .ie ¥¥ .it ¥¥¥hthM ¥iE*E*¥ .IE?ItEEII . an Figure 1.
Edge-assisted local network scenario:
A set of sensing devicesacquire observations on the physical environment to support a globalcomputational task performed at the edge processor. “wireless islands”, where a set of sensing devices (sensors) areinterconnected with one-hop wireless links to a computationalresource through a local access point ( e.g. , a cellular basestation or a Wi-Fi access point). We specifically addressan application scenario where the sensors and the edgeprocessor cooperatively perform a real-time data acquisitionand processing task, such as classification or detection basedon environmental observations (see Fig. 1). The challenge, then,is to accomplish such task with the bandwidth, computationalpower and energy constraints imposed by the limited resourcesavailable at the device and network levels.The core of the framework is a novel “semantic” approachto data selection and compression, where local classifiers atthe mobile nodes are dynamically crafted and updated based a r X i v : . [ c s . N I] F e b n the current state of the observed system and its processingobjective, together forming a continuously evolving context .The edge processor plays a key role by establishing a linkbetween content and operations within the distributed system.The local classifiers are designed to filter the data streams andprovide only the needed information to the global classifier atthe edge processor, thus minimizing bandwidth usage. However,the better the accuracy of these local classifiers, the larger theenergy necessary to run them at the individual sensors. Ourframework builds on recent results [7], [8], where classifiersimplifications are applied to the problem of explaining theoutcome of black box machine learning algorithms.An interesting connection can be made to the traditionalmultimedia compression techniques, where the componentsimperceivable by humans are removed. Thus, distortion of theoriginal signal is accepted in those regions that are not neededby the final application. This research extends this principle todata consumed by machines for general computational purposes.Additionally, we expand the traditional focus on bandwidthcompression by itself with the notion of energy-awareness.The rest of the paper is organized as follows. Section IIintroduces the general scenario and describes the problemaddressed herein. In Section III, we present the semanticcompression framework, and illustrate its key componentson an examplary problem in Section IV. Section V concludesthe paper. II. P ROBLEM F ORMULATION
Recent advances in machine learning resulted in sophisticatedmodels, which provide incredibly capable detectors of interestto IoT applications, particularly for image and video processing.Instead of working only for niche or synthetic settings, theseclassifiers are able to handle real-world input from a large vari-ety of environments. As a consequence, the resulting classifiersoften tend to be too complex in structure, and can only resideon devices capable of handling computationally-intense tasks.However, mobile sensors collecting the data for processing haveonly limited observational power, computational capabilities,and energy availability. Hence, due to constraints in theseresources, they often cannot support such complex classifiers.Fog and Edge architectures offer a solution to this issue byintroducing computational resources within the local wirelessisland. However, bandwidth constraints, often imposed by othercompeting services, limit the data that can be transferredfrom the sensors to the computational resources. In thesecircumstances, pre-filtering the data at the sensors becomesnecessary to avoid delay, data loss, or undesirable disruptionof other wireless services.A sketch of the architecture at the center of our studies is inFig. 1, where a set of sensors acquire observations in some dy-namic environment. The sensors are wirelessly interconnectedthrough a local access point ( e.g. , base station) to an edgeprocessor. The edge processor is assigned a computational task(possibly changing in time), such as the identification of humanactivities in public parks or traffic dangers in autonomousvehicles’ networks. This task corresponds to one or more classifiers taking the data streams from the sensing devices astheir inputs. The goal of the global classifiers is to achieve anaverage accuracy α , measured in terms of classification errors.For the sake of explanation ease, we introduce the notion oftemporal period, where time is discretized and indexed with t .The K sensors are connected to the edge processor throughwireless links of capacity b k,t , k = 1 , . . . , K , in the period t .A constraint b t on the overall capacity available to the sensors,where (cid:80) Kk =1 b k,t ≤ b t , can be introduced to capture channelsharing. The signal acquired by a sensor k in the time period t is X k,t . Each sensor has an energy storage for processingand transmission, where the amount of energy available atsensor k in period t is equal to e k,t . The energy storage can berefilled through charging or energy harvesting, modeled as arandom arrival process. The goal of the system is to guaranteethe wanted accuracy at the edge processor using the availablebandwidth and energy. Fig. 2 illustrates the components of thesystem for an individual sensor k .The sensors implement local classifiers which serve thepurpose of filtering out unusable data , defined as the datathat are not needed for maintaining the target accuracy at theedge processor. While the amount of data transferred from thesensors to the edge is bounded by the time-varying capacity ofthe channel, the efficacy in locally removing unnecessary datais bounded by the processing power and energy availability atthe sensors. On the one hand, the transmission of unfiltereddata may violate the bandwidth constraint, thus causing dataloss and disruption of existing wireless services. On the otherhand, running a complex local classifier may require excessivecomputational effort and energy expense to the mobile devices.We formulate an optimization problem capturing the tensionbetween these two extremes for the purposes of dynamicadaptation of filters deployed at the sensors. Based on the inputfrom the sensors, the edge processor periodically produces anew filter with controlled complexity for each sensor, basedon bandwidth and energy usage constraints following fromhigh-level operational objectives. Herein, we focus on buildingcustomized classifiers possessing the following characteristics: • Locality.
The sensor-specific classifiers will be trained toachieve a certain accuracy level for the kinds of inputsthe sensor is likely to receive. For instance, the localclassifiers will be built to provide low-error predictionsfor indoor images if the sensor is placed inside. • Bandwidth-Awareness.
The local classifiers are designedto be used as bandwidth-preserving filters, thus optimizingfor the false-negative rate to meet the bandwidth con-straints imposed by the link to the global edge processor. • Complexity and Energy-Awareness.
The design ofthe local classifiers will satisfy complexity and energyrequirements of the sensor as determined by a stochasticenergy-arrival process.Given the complex, accurate classifier at the edge, ourobjective is to build a sensor-specific classifier tailored tothe distribution of samples in the current period, and satisfyingthe bounded complexity and bandwidth usage. More formally,we are provided with a pre-trained binary classifier, e.g. , one dge Processor
Observe Filter Transmit Fit classifier to sensor locality
Sensor k . * ' ' ¥÷ , " if " " Fithian
III f¥t⇒o* " Figure 2.
Illustration of the problem:
Dynamic energy- and bandwidth-aware adaptation of local data filtering serving the purpose of global estimation. Thefigure illustrates the components of the acquisition, communication, processing and control for one sensor. detecting whether a person is visible by the sensor, denoted by f : X → { , } , where X is the space of possible inputs. Wetreat this classifier as a black-box function in order to supportas wide of a variety of machine learning algorithms as possible.For a sensor k during period t , the goal is to identify a local classifier g k,t : X → { , } , g k,t ∈ G , that meets thespecifications of the sensor, where G is the family of machinelearning classifiers we want the sensor to use (for instance,linear classifiers). In particular, we are provided with thefollowing requirements corresponding to the aforementionedcharacteristics: • Locality D k,t : The expected distribution of the sensorinputs for period t is denoted by D k,t . We want g k,t to be as accurate as f as possible on inputs from thisdistribution. • Bandwidth b k,t : The average amount of data allowed tobe transmitted by g k,t for the period t should be less than b k,t . (It is also possible to consider a generalization whereonly the total capacity b t for all sensors is provided.) • Energy e k,t : Average energy used by g k,t for the period t should be less than e k,t .In this work we assume that the customized classifier g k,t willbe built on the edge, not the sensor, and thus the computationalefficiency of estimating g k,t is not restricted.III. S EMANTIC C OMPRESSION
In this section, we outline our proposed approach to con-structing a classifier g k,t that meets the sensor’s requirementson energy, bandwidth, and locality for the period t , while stillbeing faithful to the complex, global classifier f . Energy Efficiency.
The primary obstacle with using f at thesensor level is its computational complexity. For insance, eachprediction by a neural network can often take hundreds tothousands of floating-point computations, resulting in a heavypower consumption. Instead, we are concerned with learningan energy-efficient classifier g k,t ∈ G , for G being limitedto a simpler model family, such as SVMs, decision trees,linear classifiers, etc . We define the energy consumed by g k,t for an input as E g k,t : X → R ≥ ; the average energy usedby the sensor k for period t will be E x ∼D k,t [ E g k,t ( x )] . Wealso define a penalty on the classifier for violating an energyconstraint e k,t as R E , such that R E ( E g k,t ( x ) , e k,t ) = 0 if g k,t meets the energy requirement e k,t , and R E ( E g k,t ( x ) , e k,t ) > otherwise. Since directly estimating the energy consumption E g k,t of a classifier g k,t is challenging, we use the number ofcomputational operations as a proxy, and thus R E penalizes g k,t the more operations it requires for a prediction. Locality.
Obviously, an energy-efficient classifier g k,t , byusing a simpler structure, cannot have the same generalrepresentation capabilities as the global classifier f for thecomplete range of inputs. However, in any given time period,most sensors do not receive the full variety of inputs that theglobal classifier is designed to support, and thus it is possibleto have g k,t focus its representation on the inputs expectedat the sensor. In order to identify such a g k,t , we use theexpected distribution of inputs, D k,t , to compute how similar g k,t is to f . In particular, given a loss function L ( f ( x ) , g k,t ( x )) between g k,t ’s and f ’s predictions on an instance x , e.g. , thesquared loss L sq ( a, b ) = ( a − b ) or the logistic loss L ll ( a, b ) = − a log b − (1 − a ) log(1 − b ) , we evaluate the similarity between g k,t and f as E x ∼D k,t (cid:2) L ( f ( x ) , g k,t ( x )) (cid:3) . Fig. 3 illustrates theintuition, where a complex, power-consuming global classifier f (solid gray curve) can be approximated quite well locally by asimple, and thus energy-efficient, classifier g k,t (dashed boldline). Bandwidth Awareness.
Every automated detector is accom-panied by a certain level of expected error, often measuredas the rate of false positives and false negatives. Due to theenergy constraints on the desired classifier g k,t , it may notbe able to maintain the same low error levels as the globalclassifier f , even on the local distribution of inputs. In suchsituations, we can treat g k,t as the sensor-level filtering ofthe inputs, with f running at the edge level to achieve thesame low error levels. Thus there is a trade-off between howmuch the bandwidth is used to transmit false positives versusmissing out a relevant input in order to conserve the bandwidth.We define the amount of data g k,t will use for an input x as B g k,t : X → R ≥ ; the average data transmitted by the sensorfor period t will be E x ∼D k,t [ B g k,t ( x )] . We further define thepenalty on the classifier g k,t for violating the bandwidth b k,t as R B , such that R B ( B g k,t ( x ) , b k,t ) = 0 if g k,t uses less than b k,t bandwidth, and R B ( B g k,t ( x ) , b k,t ) > otherwise. Fig. 3shows an example where a classifier that is not aware of itsuse as a filter (the leftmost example) may transmit less butave a high error rate, while a bandwidth-aware classifier (inthe middle) will obtain lower false negative rate. Semantic Compression.
From the sensor specifications,namely local distribution D k,t , energy consumption constraint e k,t and penalty function R E , bandwidth constraint b k,t andpenalty function R B , and the global classifier f , we can framethe search for the sensor-specific classifier g k,t as the followingoptimization problem to be solved periodically over time: g ∗ k,t = arg min g k,t ∈G E x ∼D k,t (cid:2) L ( f ( x ) , g k,t ( x )) (cid:3) , (1)s.t. E x ∼D k,t (cid:2) R E (cid:0) E g k,t ( x ) , e k,t (cid:1)(cid:3) ≤ ε k,t , (2) E x ∼D k,t (cid:2) R B (cid:0) B k,t ( x ) , b k,t (cid:1)(cid:3) ≤ β k,t . (3)Here ε k,t and β k,t have the meaning of the tolerances onthe expected penalties R E and R B for random observationsfollowing a given locality distribution D k,t .The distribution D k,t serves a proxy role conveying to theedge processor a local description of expected observations atthe sensor, without wasting the bandwidth for transmitting theobservations themselves. The edge processor, in turn, replies tothe sensor with a classifier g ∗ k,t , locally tuned to D k,t accordingto the problem in Eq. (1)–(3). For each particular sensor andtime period, the distribution D k,t is fixed, so the efficacy ofthis semantic compression scheme is determined by whetherthe family of local classifiers G is flexible enough for thedistribution of positive and negative samples in D k,t . However,at a larger scope, the locality D k,t may vary and is subject tonegotiation between the sensor and the edge processor.With the shape of the locality D k,t controllable, the qualityof corresponding classifiers g k,t may be improved additionallythrough locality tuning. This brings the option to view theoptimization in Eq. (1)–(3) as a subproblem for a higher-levelcontrol task, maintaining a desired aptitude of classifiers g k,t on a sequence of observations generated by the sensor. Thisway, the problem of finding optimal local g k,t may be extendedto the broader adaptive-control problem of maintaining adesired accuracy of filtration by adjusting the locality-capturingprocedure delivering distributions D k,t , such that E x ∼S k,t (cid:2) Q g k,t ( x, D k,t ) (cid:3) ≤ q k,t . (4)The penalty function Q g k,t stands for the losses we bearfrom any inadequacies of the local classifier g ∗ k,t to theparticular choice of locality D k,t , which we would like to keepbounded by a tolerance q k,t . Here, the quality is monitored forinputs from some control distribution S k,t chosen by the edgeprocessor using the empirical data arriving from the sensorand the a priori strategic objectives for the ultimate outcomesof the sensor-edge system as a whole. In practice, S k,t maycoincide with the global observatory distribution X , the localitydistribution D k,t , or can be derived from the sequence ofempirical observations obtained by the sensor k . In Eq. (4),the locality D k,t is made an argument of the penalty Q g k,t to highlight its potential role as the control “variable”. Onesimple example giving an idea of how localities D k,t may beparametrized and controlled will be given in the followingsection. False negatives False positives
Less tra ffi c More tra ffi c Xz ^ DK ,t trip , WE x , > is ~÷ Figure 3.
Localized semantic classifier compression:
The gray area depictsthe subspace of positive detections of a global black-box classifier f in thespace of all inputs X . The dashed bold line represents a simplified linearclassifier g k,t chosen to fit f only in the locality D k,t of recent inputs froma sensor k (yellow circle), and so does not need to bear the full complexityof f . Our approach draws instances from D k,t , classifies them with f , anduses the resulting sample for optimizing g k,t . Due to energy and bandwidthconstraints, different boundaries g k,t may be obtained as illustrated under theplot: Aggressive ones save traffic by capturing less but risk frequent misses(left example); conservative ones avoid misses by capturing more but generatemore traffic (right example). IV. S
IMULATION R ESULTS
In order to illustrate the feasibility of the proposed approach,let us consider a motivating example of a binary classificationproblem, in the context of a single sensor-edge pair (for thisreason we omit the index k below, for the sake of brevity).As customary, input observations subject to classificationcome as feature vectors in a multidimensional vector space X .The two classes correspond to the sets of observations that areto be registered by the sensor-edge system (positives), versusthe rest (negatives). In this case, probability distributions of bothclasses are set to be Gaussian mixtures (and so is, therefore,the joint distribution X ). Both mixtures consist of the samenumber of symmetric normally-distributed components centeredequidistantly on a number of lines parallel to the main diagonalof the unit hypercube.For simplicity, we assume that both f and g ∈ G belong tothe same class of Support Vector Machine classifiers (SVMs)working in the space X . To satisfy the requirement of g havinga lower complexity than f , the class G is limited to SVMs withlinear kernels, while the reference global classifier f is trainedfor the kernel of Gaussian radial basis functions (and can bereplaced with even more computationally-intense classifier).Each locality distribution D t guiding the selection of trainingsamples for the on-sensor classifiers g t is set to be a uniformdistribution in a sphere described by its center and radius r t . Byhe nature of the distribution X , the local and global accuracyof the classifier f is expected to not differ significantly, whilethe accuracy of localized classifiers g shall be sensitive to thelocalities D t and their sizes r t .In this circumstances, the applicability of the problemstatements given in Section II to this detection task requiresa study of two aspects of the system: (i) The accuracy of thelocalized classifiers g t for different spheres D t as a function ofradii r t and the update frequency /γ . (ii) Realization of actualdistributions of consecutive observations x t in the data for adesired update frequency, and the procedure for adaptivelychoosing the radii r t reacting to the accuracy-complexitytradeoff.To this end, both in this specific example and in general, weneed to be in possession of two samples. First, a labeled trainingdataset of pairs ( z i , y i ) is necessary, where points z i ∈ X are drawn from the joint distribution of observations X , and y i ∈ { , } signify the corresponding labels. We can assumethe availability of this sample Z without any loss of generality,as the very problem setting given in Section II starts with aclassifier f that has to be trained on some sample, which wecan reuse here for Z . In the unsupervised case, for the purposesof the following discussion, labels y i can be defined by theoutcomes f ( z i ) of the global classifier f .Second, it is necessary to have a sample of one or moretrajectories S = ( x , . . . , x T ) , x t ∈ X representative of thesequential process generating observations on the sensor. Inpractice, this sample can be obtained from previous, nonadap-tive runs of the sensor-edge system in question, where allsensor observations eventually reach and get accumulated atthe edge processor. In this example problem, we assume thatthe trajectory distribution S follows the general distribution X (which would also likely be the case in general, as well, unlessthe nature of observation process dictates otherwise). Adheringto this assumption, we generate a sample S as a Markov chainstarting from a randomly chosen point x ∼ X and continuingby applying the Metropolis-Hastings sampling algorithm to thedistribution X .The two aforementioned aspects of the system, then, canbe studied through the following duplex sampling procedure(schematically depicted in Fig. 4).1) For each update frequency /γ (or, equivalently, thelength of the update period γ in the number of ob-servations), draw a sample of subsequences S t ( γ ) =( x t − γ +1 , . . . , x t ) of γ consecutive observations alongthe trajectory S .2) For each subsequence S t ( γ ) :a) Find the minimal sphere D t containing all (or agiven percentage of) points x t − γ +1 , . . . , x t .b) Sample points Z t ( γ ) = { ( z i , y i ) ∈ Z | z i ∼ D t } from the general training sample Z uniformly insideof the sphere D t .c) Using the points in Z t ( γ ) as a training sample, fita classifier g t ∈ G to a desired quality.d) Apply the classifier g t to the points in the subse-quence S t ( γ ) , comparing the verdicts of g t to the (a) (c)(d) Training sampleTrajectory sample (b) . - : D= f II.If ?tIIe?fjEE:¥I¥ Figure 4.
Trajectory sampling procedure:
Schematic representation of thestages (a)–(d), highlighting the key variables invloved. corresponding verdicts of the reference classifier f for those same points in S t ( γ ) .e) Store the radius r t ( γ ) of the sphere D t and the re-sulting accuracy α t ( γ ) of the localized classifier g t on the points in S t ( γ ) .With the accumulated statistics of radii r t ( γ ) and accura-cies α t ( γ ) , it is then possible for us to compute the empiricalaverages of both of these features over trajectory’s subsequencesas functions of update period γ .Fig. 5 and 6 demonstrate these functional relations inthe case of our motivating example for a multidimensionalGaussian sample. The former figure depicts the average radiusof spheres containing of the points in subsequences S t ( γ ) for different values of γ . As we can see, the average radiusquickly grows as the update period increases. The latterfigure highlights the opposite trend: the accuracy of locally-fitclassifiers g t almost monotonically decreases with increasingperiod of updates. For comparison, the accuracy of the globalclassifier f when it is implemented as an RBF-kernel SVMfluctuates insignificantly around independent of the updatefrequency γ .Here both f and g t were trained to treat both false positiveand false negatives equally; in cases where it is intolerableto miss detections due to localized approximation, the sametrends will be present for respectively adjusted g t . The choiceof update frequency can be guided by the penalty taken by theaccuracy α t ( γ ) when the classifier g t trained for a locality D t is kept for a use in the subsequent localities D t +1 , D t +2 , . . . without an update. For our example this relation is summarizedin Fig. 7 showing the change in mean accuracy of a localclassifier g t as a function of the delay between its training andits usage. The x-axis measures the delay relative to the updateperiod length γ . The y-axis measures the ratio between themean accuracy for the trajectory subsequence corresponding tothe moment a local classifier g t is used and the mean accuracy . . . .
81 Update period γ M e a n r a d i u s r Figure 5.
Update locality:
Average radius of a sphere containing ofpoints in trajectory subsequences, as a function of update period for theexample problem. The range between 0.25- and 0.75-quantiles is highlighedin gray.
50 250 500 750 1000 1250 1500 1750 20000 . . . . . .
91 Update period γ M e a n a cc u r a c y α Figure 6.
Local accuracy:
Average accuracy as a function of update periodfor the example problem. The range between 0.25- and 0.75-quantiles ishighlighed in gray. for the trajectory subseqence corresponding to the momentlocality D t was captured.All three of these relations confirm the feasibility of theassumptions underlying the problem formulation, that, whilesimpler local classifiers g t have poor accuracy globally, theirquality catches up for frequent locality updates to a satisfactorylevel comparable to that of the global classifier f . The ultimatequality of the resulting system will, of course, depend signifi-cantly on the mutual compatibility of the data distribution X (governing the complexity of the global classifier f ), the familyof local classifiers G , the form of locality distributions D t , andthe constraints on the desired accuracy. For instance, whensensor sampling trajectories do not exhibit enough compactnessas measured by the form D t and g t , it might be problematic oreven impossible to achieve very high levels of accuracy withthe localized substitution classifiers g t . In each particular case,the limits of the achievable results should be studied separately,e.g., using the above trajectory-sampling procedure. . . . . . . . . . . γ = 50 γ = 100 γ = 200 γ = 300 γ = 400 γ = 500 γ = 600 Update delay (in γ ) M e a n a cc u r a c y r a t i o Figure 7.
Local classifier aging:
Relative change in average accuracy of alocal classifier g t for different update periods γ as a function of update delaynormalized by the length of update period. For problems where, like in our example here, the localityof the space X can be exploited well for a given D t and g t , it opens the possibility for an efficient adaptation of thelocality D t ( τ ) as a function of some control parameters τ .For instance, here the update period γ can serve the roleof the parameter τ , with the control objective consisting inkeeping it smaller than some γ guaranteeing a desired accuracy(according to Fig. 6).V. C ONCLUSIONS
Sophisticated IoT systems often involve combining sensing,communication, and processing capabilities. Recent architec-tures for such IoT systems often perform expensive computationat the edge-level, in order for the mobile devices to utilizetheir limited energy for sensing and transmission. However,such architectures often cannot meet the tight constraints of atime-varying or limited bandwidth availability, as is commonin real world applications, due to their need to communicateall of the data from the sensor-level devices to the edge.In this paper, we proposed an alternative architecture wherethe edge and the devices perform the computation cooperatively.The core of our proposed approach is to provide a “semantic”strategy for carrying out this sharing of the computation: wedynamically craft customized classifiers for each sensor thatdefine what the sensor device will communicate to the edgeprocessor, thus offloading majority of the computation to thesedevices. This proposed design of sensor-specific classifierstakes into account the various properties of the current contextsuch as the sensor-specific distribution of inputs that the deviceis likely to observe, the energy resources and constraints onthe device, and the time-varying limitations on the sharedbandwidth to the edge.We showed the feasibility of our semantic approach usingsimulated experiments. We demonstrated that simple, energy-efficient classifiers can be as accurate in classification ascomplex classifiers if we utilize the distribution inputs that thesensing device is likely to receive when constructing them. Weurther showed that the approach is fairly robust to changes inthis distribution of inputs over time. Although the classifiersneed to be updated as the current context of the sensors andthe edge changes over time, we also demonstrated that thesensor-specific classifiers still maintain accuracy even if theyare not updated very frequently. With these encouraging results,we are interested in future to deploy such an architecture toreal-world IoT testbeds.R
EFERENCES[1] L. Atzori, A. Iera, and G. Morabito, “The Internet of Things: A survey,”
Computer networks , vol. 54, no. 15, pp. 2787–2805, 2010.[2] P. Neirotti, A. D. Marco, A. Cagliano, G. Mangano, and F. Scorrano,“Current trends in smart city initiatives: Some stylised facts,”
Cities , vol. 38,pp. 25–36, 2014.[3] C. T. Barba, M. A. Mateos, P. R. Soto, A. M. Mezher, and M. A.Igartua, “Smart city for vanets using warning messages, traffic statisticsand intelligent traffic lights,” in
Intelligent Vehicles Symposium (IV), 2012IEEE . IEEE, 2012, pp. 902–907.[4] F. J. Martinez, C.-K. Toh, J.-C. Cano, C. T. Calafate, and P. Manzoni,“Emergency services in future intelligent transportation systems basedon vehicular communication networks,”
IEEE Intelligent TransportationSystems Magazine , vol. 2, no. 2, pp. 6–20, 2010.[5] F. Bonomi, R. Milito, J. Zhu, and S. Addepalli, “Fog computing and itsrole in the internet of things,” in
Proceedings of the First Edition of theMCC Workshop on Mobile Cloud Computing , ser. MCC ’12, 2012, pp.13–16.[6] M. Satyanarayanan, P. Simoens, Y. Xiao, P. Pillai, Z. Chen, K. Ha, W. Hu,and B. Amos, “Edge analytics in the internet of things,”
IEEE PervasiveComputing , vol. 14, no. 2, pp. 24–31, 2015.[7] M. T. Ribeiro, S. Singh, and C. Guestrin, “"Why should I trust you?":Explaining the predictions of any classifier,” in
Knowledge Discovery andData Mining (KDD) , 2016.[8] ——, “Model-agnostic interpretability of machine learning,” in