[PDF] Deep Learning Anomaly Detection for Cellular IoT with Applications in Smart Logistics

Abstract

The number of connected Internet of Things (IoT) devices within cyber-physical infrastructure systems grows at an increasing rate. This poses significant device management and security challenges to current IoT networks. Among several approaches to cope with these challenges, data-based methods rooted in deep learning (DL) are receiving an increased interest. In this paper, motivated by the upcoming surge of 5G IoT connectivity in industrial environments, we propose to integrate a DL-based anomaly detection (AD) as a service into the 3GPP mobile cellular IoT architecture. The proposed architecture embeds autoencoder based anomaly detection modules both at the IoT devices (ADM-EDGE) and in the mobile core network (ADM-FOG), thereby balancing between the system responsiveness and accuracy. We design, integrate, demonstrate and evaluate a testbed that implements the above service in a real-world deployment integrated within the 3GPP Narrow-Band IoT (NB-IoT) mobile operator network.

Full PDF

11 Deep Learning Anomaly Detection for Cellular IoTwith Applications in Smart Logistics

Milos Savic, Milan Lukic, Dragan Danilovic, Zarko Bodroski, Dragana Bajovic

Member , Ivan Mezei

SeniorMember , Dejan Vukobratovic

Senior Member , Srdjan Skrbic and Dusan Jakovetic

Member

Abstract —The number of connected Internet of Things (IoT)devices grows at an increasing rate, revealing shortcomings ofcurrent IoT networks for cyber-physical infrastructure systemsto cope with ensuing device management and security issues.Data-based methods rooted in deep learning (DL) are recentlyconsidered to cope with such problems, albeit challenged bydeployment of deep learning models at resource-constrainedIoT devices. Motivated by the upcoming surge of 5G IoTconnectivity in industrial environments, in this paper, we proposeto integrate a DL-based anomaly detection (AD) as a serviceinto the 3GPP mobile cellular IoT architecture. The proposedarchitecture embeds deep autoencoder based anomaly detectionmodules both at the IoT devices (ADM-EDGE) and in the mobilecore network (ADM-FOG), thereby balancing between the systemresponsiveness and accuracy. We design, integrate, demonstrateand evaluate a testbed that implements the above service in areal-world deployment integrated within the 3GPP Narrow-BandIoT (NB-IoT) mobile operator network.

Index Terms —Anomaly Detection, Cellular IoT, Industrial IoT,Machine Learning, Smart Logistics

I. I

NTRODUCTION T HE proliferation of Internet of Things (IoT) and de-ployment of massive amount of IoT devices in cyber-physical infrastructure systems such as Smart Factories [1],[2], Smart Grids [3], Smart Logistics [4] and others, broughtforward increasing number of cyber-security [5] and propertymanagement challenges [6]. For example, Smart Factory orSmart Logistics operations include asset management, intelli-gent manufacturing, performance optimization and monitoring,planning, human-machine interaction, all of which are notdesigned with cyber-security protection or data managementof Industrial IoT scale [7], [8]. Handling massive IoT devicedata integrity and device behaviour in real-time industrialIoT operation and management requires novel approacheswhich are, in recent research, mainly addressed using machine-learning (ML) and deep-learning (DL) techniques [9]–[11].The ability of ML/DL algorithms to process massive data setswhile extracting useful features allow them to quickly identifyanomalies and prevent breakdowns, which has potentiallybroad application space in cyber-physical systems [12], [13].

Milos Savic, Zarko Bodroski, Srdjan Skrbic and Dusan Jakovetic arewith Faculty of Sciences, University of Novi Sad, Serbia, e-mail: { svc,zarko.bodroski, srdjan.skrbic, dusan.jakovetic } @dmi.uns.ac.rs.Milan Lukic, Dragana Bajovic, Ivan Mezei and Dejan Vukobratovic arewith the Faculty of Technical Sciences, University of Novi Sad, Novi Sad,Serbia, e-mail: { milan lukic, dbajovic, imezei, dejanv } @uns.ac.rs.Dragan Danilovic is with the VIP Mobile, Bul. Multina Milankovica 1z,Belgrade, Serbia, e-mail: [email protected] work is supported in part by European Commission’s Horizon 2020Research and Innovation Programme, Grant No. 833828. With the introduction of 5 th generation (5G) cellular net-works, IoT cyber-physical infrastructure systems are becomingincreasingly reliant on cellular networks [14]. 3GPP standard-ization initiated work on support for Cellular IoT (CIoT) dur-ing the 4G Long-Term Evolution (4G LTE) development [15],which resulted in ﬁrst CIoT technologies such as Narrow-BandIoT (NB-IoT) being introduced in 3GPP Release 13 [16], [17].This work has since then expanded to Ultra-Reliable Low-Latency Communications (URLLC) and massive Machine-Type Communications (mMTC) services in 5G [18]. As bil-lions of new CIoT devices are expected to be connected world-wide in the following years, providing efﬁcient and automatedmonitoring and threat detection both at the CIoT devicesand within the CIoT network architecture will be critical tosecurely manage devices and cover this attack surface [19],[20].In this paper, we propose to augment the 3GPP mobilecellular architecture with additional enhancements that providesupport for network-wide anomaly detection (AD) service. Ourtarget is a generic AD CIoT service which can be tailored toapplications ranging from identifying malfunctioning devicesto threat detection for secure CIoT. The proposed hierarchicalAD architecture embeds anomaly detection modules (ADMs)both at the IoT devices (ADM-EDGE) and in the mobile corenetwork (ADM-FOG). The ADM modules are based on deepautoencoders (AE) whose complexity is matched to both theedge and the fog deployment, balancing between the systemresponsiveness and accuracy. The distinguishing feature of ourwork is that the proposed AD enhancement of CIoT architec-ture, including both ADM-EDGE and ADM-FOG modules,is implemented and deployed in a real-world CIoT networkbased on 3GPP NB-IoT standard and demonstrated in thecontext of Smart Logistics. Moreover, we custom-designed anovel NB-IoT device platform for Smart Logistics use case,where NB-IoT devices are connected to shipping containersin a factory supply chain, in order to collect data, deploy andtest the ADM-EDGE module.The paper is organized as follows. In Sec. II, we providetechnical background, review the related work and present thecontributions of this paper. The proposed solution for DL-based anomaly detection in CIoT is presented in detail in Sec.III. In Sec. IV, we describe system integration, data generationand provide numerical results from real-world experiments.The paper is concluded in Sec. V. a r X i v : . [ c s . N I] F e b Fig. 1. 3GPP CIoT architecture augmented with Anomaly Detection enhancements.

II. B

ACKGROUND

In this work, we augment the CIoT architecture withanomaly detection capabilities at the IoT devices (edge) andthe mobile core network servers (fog). Before going to details,we ﬁrst provide the technical background needed for under-standing the proposed system architecture and functionality.

A. 3GPP Cellular IoT Architecture

We start by describing current state-of-the-art CIoT ar-chitecture focusing primarily on 3GPP NB-IoT technology[15], [16]. NB-IoT is a new CIoT technology that can beseamlessly integrated in existing 3GPP 4G/5G architecture,coexisting in the radio access network with the current 3GPP4G LTE and the emerging 3GPP 5G NR technology, and usingthe same evolved packet core (EPC) network functionalities[22]. Focusing on the current 3GPP 4G LTE architecture,relevant 3GPP CIoT architecture elements are illustrated inFig. 1. CIoT user equipment (CIoT UE), which is a formalname for NB-IoT device, connects to the network via aneighbouring base station or eNodeB (eNB), which is themain element of Evolved Universal Terrestrial Radio AccessNetwork (E-UTRAN). NB-IoT downlink/uplink resources areallocated either within 4G LTE band (in-band deployment),at its edge (guard-band deployment), or as a separate channel(out-of-band deployment). After eNB, both user-plane (i.e.,user data packets) and control-plane (i.e., signalling messages)information is processed at CIoT Serving Gateway Node(C-SGN), which covers functionalities of both control-planeMobility Management Entity (MME) and user-plane ServingGateway (SGW). User-plane data further ﬂows through PacketGateway (PGW) to the IoT platform, which forwards data viathe Internet to the external network application servers [21].Two options for data transfer between the CIoT UE and theIoT platform are envisioned. The ﬁrst one (mandatory) usessignalling radio bearers to transmit user data, thus avoidingestablishment of data radio bearers for energy efﬁciency. FromeNB, data is routed either following a control-plane path viaan EPC element called Service Capability Exposure Function (SCEF) for non-IP data, or a user-plane path via C-SGNand PGW for both IP/non-IP data. The second one (optional)establishes a data radio bearer to send IP/non-IP data via aneNB/C-SGN/PGW user-plane path to the IoT platform. Herein,we assume that a UDP encapsulated IP data from CIoT UEdevice traverses the path following the latter approach, whichwill impact the deployment choices for the proposed anomalydetection enhancements strategy described in Sec. III.

B. Machine Learning for Anomaly Detection at the Edge

Security challenges and threats in industrial IoT networkscall for innovative applications of ML/DL techniques for IoTsecurity. More speciﬁcally, these techniques can be employedfor authentication and access control, anomaly and intrusiondetection, malware analysis and distributed denial-of-service(DDoS) attacks detection and mitigation [23], [24]. The mainchallenges of implementing ML/DL models at the edge arescalability issues and IoT edge platforms resource limitations[13]. Depending on the ML algorithm being run on the edgenode, the size of the ML model can go as low as a fewkilobytes. Also, the requirements in regard to the memorycapacity and computational power depend heavily on thechoice whether the models are trained on the edge, or pre-trained models are being used.Besides the sensor readouts, which are the primary source ofdata for ML/DL at the edge, the IoT module itself can providea host of useful insights about the network and wireless linkconditions, the feature we also exploit in our edge devicedesign described in Sec. III-B. The amount of useful datathat can be extracted from the IoT module generally exceedsthe capacity of the wireless communication channel, however,this kind of metadata can be used to feed a locally run MLalgorithm for anomaly detection, or be aggregated and sent tothe core network fog gateway periodically, for further analysis.In this work, to perform AD, we apply deep autoencoders(AE). AE is a neural network that learns a latent lower-dimensional representation of training data by reproducingits inputs through latent variables in the hidden layers at theoutput layer with the smallest possible error. The error function captures differences between values at the input and outputlayer. This so-called reconstruction error is used as the outlierscore in an anomaly detection process. The proposed ADarchitecture is hierarchical, as it comprises AD models runningat different levels within an CIoT system (both IoT edgedevices and core network fog gateway), where more powerfulhigher-level models are activated if decisions of lower-levelmodels have low conﬁdence scores (see Sec. III-C for details).

C. Related Work

Recent research efforts in the area of ML methods foranomaly detection at the edge IoT devices have been focusedon efﬁcient utilization of the limited computation resourcesat the edge. It is well-known that the training process formost of the deep learning-based AI models is highly resource-intensive, usually requiring hardware resources (e.g., GPU,FPGA) [25]. Resource-aware edge AI model designs havebeen considered in a different line of research. The AutoMLidea [26] and the Neural Architecture Search techniques [27]have been used to devise resource-efﬁcient edge AI modelstailored to the hardware resource constraints of both the un-derlying edge devices and network servers. Important researchadvances were also made regarding the tailored design of DLarchitectures for resource-constrained devices: Zhang et al.proposed an extremely efﬁcient convolutional neural network(CNN) for mobile devices and Nikouei et al. introduced alightweight CNN that can run on edge devices [28].A number of proposals using distributed ML/DL for securityin Industrial IoT are recently considered [29]. In DIoT, arecurrent neural network (RNN) is trained for each devicetype present in the IoT network to learn a normal commu-nication proﬁle. A federated (distributed) learning scheme isemployed to learn device-type speciﬁc RNNs [30]. Wang et al.proposed a control algorithm that determines the best trade-off between local update and global parameter aggregationin data partitioned federated learning models trained usinggradient-descent algorithms [31]. Ferdowsi and Saad proposeda distributed privacy preserving IoT intrusion detection secu-rity system based on federated generative adversarial networks.In the proposed decentralized architecture, every IoT devicemonitors its own data as well as neighbor IoT devices to detectinternal and external attacks [32]. Meidan et al. proposed N-BaIoT – a method for detecting IoT botnet attacks based ondeep autoencoders. For each device present in a IoT network, adeep autoencoder is trained on features extracted from normaltrafﬁc data [33]. Bezerra et al. proposed IoTDS – a distributedmethod for detecting IoT botnet attacks based on light-weightone-class classiﬁcation models [34]. Rathore and Park createda decentralized attack detection framework for IoT networksbased on semi-supervised learning employing extreme learn-ing machines and fuzzy C-means algorithms [35]. Doshi etal. employed various machine learning algorithms (k-nearestneighbor, support vector machines, decision trees and neuralnetworks) to detect DDoS attack trafﬁc in consumer IoTdevices [36]. Pajouh et al. (2018) proposed a malware detec-tion approach for IoT based on deep RNNs [37], while [38]presents an approach to anomaly detection that implements autoencoders at each edge device, while the edge devices areorchestrated via a federated learning model with the centralserver. In [39], authors show that Random Forest, MultilayerPerceptron, and Discriminant Analysis models can viably savetime and energy on the edge device during data transmission,while K-Nearest Neighbors, although reliable in terms ofprediction accuracy, is resource-inefﬁcient in their studies.

D. Contributions

We now summarize the main contributions of the paper. Wepropose an approach to embed anomaly detection capabilitiesin the Cellular IoT architecture, providing for combined threatdetection both at the IoT devices (edge) and in the mobilecore network servers (fog). The corresponding architecturedesign is motivated by and well-suited for Smart Logistics.The proposed edge-based ADM-EDGE and fog-based ADM-FOG modules can balance between the responsiveness andaccuracy by employing deep autoencoder (AE) based learningmodules whose complexity is matched to both edge and fogdeployment. We carry out implementation, integration, andevaluation of an end-to-end testbed according to the proposedarchitecture. This includes: 1) real IoT data generation andemulation of a real-world Smart Logistics scenario; 2) fabrica-tion and conﬁguration of the relevant edge and fog hardwareand infrastructure; 3) development and implementation of asoftware library for edge and fog-based anomaly detection;and 4) evaluation of the developed anomaly detectors on thegenerated data and quantiﬁcation of detection performance-response time tradeoffs. For the latter contribution, we ex-plicitly quantify the tradeoffs that take into account limitedcomputational and storage budget at the edge devices, andcommunication and processing costs due to processing largeramounts of data at the fog for improved AD performance.III. DL-B ASED A NOMALY D ETECTION IN O TIn this section, we describe in detail the design and systemarchitecture of the proposed AD support for the 3GPP NB-IoTmobile cellular network.

A. System Model and Architecture

We augment 3GPP CIoT system architecture with supportfor CIoT device anomaly detection. Augmented architecture isillustrated in Fig. 1 and introduces two additional ADMs: oneplaced at the edge CIoT UE (ADM-EDGE) and another placedat the fog gateway (ADM-FOG). The architecture representsgeneric CIoT enhancement for anomaly detection, althoughin this work, we specialize it to the domain of Smart Lo-gistics. This includes managing supply of items from variousorigin points delivered to warehouses in manufacturing plants(Fig. 1). Items being delivered are packed into containers, eachof which has an NB-IoT device attached. For this purpose, wedesigned an entirely new NB-IoT UE device, and deployedsuitable ADM-EDGE and ADM-FOG modules at both NB-IoT UEs and the FGW server within the mobile core network. Response time is the time passed from the occurrence of an anomaly toits detection

Fig. 2. 3GPP CIoT Anomaly Detection processing ﬂow.

ADM-EDGE:

As described below, NB-IoT devices collectvarious information such as acceleration and GPS coordinates.This sensory information can be used to detect anomaliessuch as physical tampering of items, container mishandlingsuch as overturning, delays, routing problems, incidents withthe delivery vehicles, etc. We assume each NB-IoT devicepossesses two types of sensors: i) sensor S1 with low samplingrate f [Hz] and sampling period ∆ = f [s] (in our case,we consider GPS sensor that samples the outdoor devicelocation), and 2) sensor S2 with high sampling rate f [Hz]and sampling period ∆ = f [s] (in our case, we consideraccelerometer/gyroscope that samples vibration monitoringparameters), as illustrated in Fig. 2.Due to limited memory capacity and processing power,ADM-EDGE integrated into an NB-IoT device ﬁrmware re-quires restrictive design. ADM-EDGE consists of a pre-trainedautoencoder with a single hidden layer. At the input, ADM-EDGE processes a single data point that consists of a singleS1 and S2 value. As illustrated in Fig. 2, we assume ADM-EDGE is triggered synchronously with the low-rate sensor S1outputs X S [ k ] = X S ( t = k ∆ ) , k = { , , . . . } , where ∆ is the sampling period of the sensor S1 output function X S ( t ) .Besides an S1 sample, ADM-EDGE is fed with the sensor S2value X S [ k ] , which is a root mean square (RMS) aggregatevalue of high-rate sensor S2 output samples calculated overthe interval of duration ∆ between the last two S1 outputs.In other words, X S [ k ] = (cid:113) M (cid:80) (cid:96) X S ( t = (cid:96) ∆ ) , where (cid:96) satisﬁes ( k − < (cid:96) ∆ ≤ k ∆ , which amounts to the last M = ∆ ∆ S2 samples preceding t = k ∆ . To summarize,a pair of S1 and aggregated S2 values ( X S [ k ] , X S [ k ]) represents a data point fed into an ADM-EDGE autoencoderevery ∆ [s]. For each decision, after ADM-EDGE processingtime, the device outputs a conﬁdence score (see Sec. III-C). ADM-FOG:

NB-IoT devices connect to a mobile networkand transfer data via the nearest base station. Each ADM-EDGE data point is forwarded to the FGW, adjoined withthe ADM-EDGE conﬁdence score evaluated from the lastavailable data point. The communication delay incurred byNB-IoT network connection may vary between the orderof tens-of-milliseconds to several tens-of-seconds, dependingon the NB-IoT device radio conditions and network load.FGW server runs an instance of ADM-FOG relying on highermemory capacity and processing power. Thus ADM-FOG uses more powerful autoencoder processing multi-variate timeseries through several hidden layers. Larger input is consideredwhich is formed by concatenating the last L ADM-EDGE datapoints (see Fig. 2). Thus at the time instant t k when the k -thdata point is received at the FGW (note that t k = k ∆ + τ k ,where τ k is communication delay of the k -th data point), theADM-FOG is triggered with the input containing the set of thelast L data points { ( X S [ i ] , X S [ i ]) } k − L

1) Cellular connectivity:

To fulﬁll the requirement forubiquitous connectivity, while keeping the power consumptionof the battery-powered device low, we utilize a BG96 cellularmodule from Quectel, which supports NB-IoT and LTE-M,as state-of-the-art 3GPP CIoT communication standards, thatwill be further evolved in 5G standardization [42]. In addition,EGPRS is supported to ensure the connectivity in areas whereLTE carrier might not be available. Finally, the integrated

Fig. 3. 3GPP NB-IoT/LTE-M edge node running ADM-EDGE model.

GNSS module provides the geolocation information which isessential to the asset tracking task in the logistics use case.The intention is to use NB-IoT as the primary means ofcommunication due to its desirable properties, namely energyefﬁciency combined with extended coverage [41]. However,in occasions when it is necessary to transfer larger amountsof data, (e.g. a new ﬁrmware image), LTE-M is more efﬁcientsolution. The architecture of our edge node provides ﬂexibilitywhich allows us to adapt the throughput of the communicationmodule according to the needs of the application.

2) On-board sensors:

Apart from the localization dataprovided by the GNSS module, the on-board environmentalsensors are used to measure parameters relevant to the logis-tics use case. The 6-axis Inertial Measurement Unit (IMU)provides information about the vibrations and the magneticﬁeld along X, Y and Z axes relative to the chip position. Theadditional set of sensors is used to measure the atmosphericconditions such as air temperature, pressure and humidity.The designed platform provides additional metadata thatcould be used as inputs to ADM-EDGE. For example, thecellular modem is capable of providing the standard set ofradio condition metrics (SNR, RSSI, RSRP, etc.). In addition,our design includes the on-board current measuring circuitrythat allows the micro-controller unit (MCU) to acquire precisemeasurements of the power consumption by BG96 module.

3) The MCU features and capabilities:

The main MCUinside edge node is a low-power 32-bit ARM Cortex M0+ with256KB of FLASH and 32KB of SRAM, operating at 16MHz.The MCU resources are sufﬁcient to efﬁciently control therest of the circuitry, while maintaining the low power con-sumption, especially in the sleep mode. However, the absenceof operating system as well as the hardware constraints limitthe usage of ML tools only to lightweight models that are fullycustomized and optimized for a given application. Finally, anexternal FLASH memory module enables data logging overthe intervals when there is no connectivity, and is used tostore the ﬁrmware images during over-the-air updates.

4) Security:

In an industrial setup, the security is of thecritical importance. Thereby, we use hardware crypto elementwhich enables ofﬂoading the computationally expensive asym-metric cryptographic algorithms (ECC and RSA) from theresource-constrained MCU [43]. Tampering-resistant memorywithin the crypto chip is used to store security credentials,making FW on the host MCU oblivious of the sensitiveinformation such as the encryption keys and certiﬁcates.

C. Anomaly Detection using ADM-EDGE and ADM-FOG

ADM-EDGE and ADM-FOG detect anomalies using au-toencoders. Let us assume that the device behaviour is de-scribed by a feature vector X containing k real-valued fea-tures. Those may be values observed at one particular pointin time or multivariate time series. Let D denote a set ofdata points that depicts the normal (nominal) behaviour ofthe device (the training dataset), let A ( D ) be an autoencodertrained on D , and let e denote the maximal error of A on D .Then, a data point y not contained in D (a data point thatis not present in the training dataset) can be considered asan anomaly if the difference between y and A ( y ) , computedby the same error function that was used for training theautoencoder, is higher than e , where A ( y ) denotes the outputof A for y .ADM-EDGE and ADM-FOG autoencoders identify anoma-lies according to the previously described rule. For eachanomaly detection decision, the conﬁdence score C ( y ) iscomputed according to the following formula: C ( y ) = σ ( Err { y, A ( y ) } − e ) , (1)where Err {·} is the error function used to train A (e.g., themean squared error) and σ denotes the sigmoid function. Theimportant property of the conﬁdence score function is thatnon-anomalous data points have scores in the range (0, 0.5],whereas anomalous data points exhibit scores higher in theinterval (0.5, 1). In other words, conﬁdence scores close to0 indicate non-anomalous data points, while values close to 1signify anomalies. Thus, conﬁdence scores for non-anomalousdata points after making decision are further transformed into − C , where C is a value obtained by Eq. (1).ADM-EDGE autoencoders have a predeﬁned structure witha single hidden layer containing n/ nodes, where n is thenumber of input features. They use the ReLU activationfunction for the hidden layer. Additionally, bias variablesare not considered for internal nodes. Due to constraintsof NB-IoT devices, the training of lightweight autoencodersis performed ofﬂine using a Python module utilizing theTensorﬂow library. This ADM module determines lightweightautoencoder weights by optimizing the mean squared errorusing the Adam optimizer [44] for a given number of epochsand batch size. Before training, data points in the input trainingdataset are normalized such that each feature has zero meanand unit variance. The weights of the trained model and datanormalization parameters are then exported to textual ﬁles.An inference function performing anomaly detection on a pre-trained lightweight autoencoder is implemented in C withoutrelying on any external library. This inference function isdirectly integrated into the ﬁrmware of our NB-IoT devices. Decisions made by ADM-EDGE lightweight autoencodersare re-evaluated by ADM-FOG autoencoders in case of lowconﬁdence scores. The default value of the threshold is set to C th = 0 . , i.e., the decisions with C < C th are re-evaluated.We adopt here a standard, conﬁdence-score based decisionthat is simple but effective; for more advanced mechanismson how to ofﬂoad decisions from the edge, see, e.g., [45].The threshold C th is a tunable parameter that allows to trade-off conﬁdence in the decision about anomaly and responsetime. Lower threshold corresponds to the system designer’ssatisfaction with lower conﬁdence scores, but the averageresponse time within a time interval for the same inputdata set is decreased. In contrast to ADM-EDGE lightweightautoencoders, ADM-FOG autoencoders may have an arbitrarynumber of hidden layers. Additionally, they process multivari-ate time series constructed using the sliding window approachinstead of single data points.IV. S YSTEM I NTEGRATION , D

ATA G ENERATION AND N UMERICAL R ESULTS

A. System Integration

To integrate the system, collect real-world data and performtesting and evaluation, CIoT UE is connected to the FGW viaa mobile operator macro-cellular NB-IoT eNB. CIoT UE isrunning ADM-EDGE software module and periodically sendsdata points to the FGW encapsulated into UDP packets. Withinthe mobile operator core network, the general purpose server isset and connected to the PGW gateway. ADM-FOG softwaremodule within the server accepts UDP packets sent by CIoTUE. The server provides sufﬁcient resources to run ADM-FOG module, so in the sequel, we focus on the ADM-EDGEmodule deployment on the CIoT UE device.To estimate the resource utilization of ML/DL ADM-EDGEmodel in terms of memory footprint the following results aregiven in Table I. One can note that ADM-EDGE consumes asmall fraction of standard NB-IoT device ﬁrmware needed forbasic device sensing, processing and communication function-ality. Tensorﬂow and Tensorﬂow lite exported models sizes arealso given for reference.

TABLE IADM-EDGE

MEMORY RESOURCE UTILIZATION . MODEL Size in bytes

Firmware without ADM-EDGE 55816 (21,3%) out of 262144Firmware with ADM-EDGE 61896 (23,6%) out of 262144ADM-EDGE only 6080 ( ∼ B. Data Generation

To generate the dataset (elaborated in Section IV-C), weused NB-IoT edge nodes described in Section III-B. Wecreated a setup where an edge node has been attached to a box-shaped container inside a transport vehicle moving throughthe city of Novi Sad. The device was initially connected tothe NB-IoT network, and it had the uninterrupted connectivity throughout the path. We collected the positioning data fromGNSS module (timestamp, latitude, longitude, altitude, speedand number of satellites in range), as well as the outputs of theIMU (acceleration and magnetic ﬁeld along the 3 spatial axes).The time resolution of the GNSS samples was ∆ = 10 s.The sampling rate of the IMU is ∆ = 15 ms (see Fig.4 for an example of IMU signals), thus we calculated theRMS for the acceleration and magnetic ﬁeld samples collectedwithin a sampling interval ∆ (as described in Sec. III.A). Thecollected data was stored at the database at the FGW, and wereused to train the AD model discussed in the following section. Fig. 4. Example of acceleration data from IMU.

C. Numerical Results

ADM-EDGE and ADM-FOG autoencoders were evaluatedusing two independent datasets. The ﬁrst dataset reﬂects thebehaviour of the edge node device under normal drivingconditions without large disturbances. This dataset contains1470 data points collected in a period of three days and itis used to train ADM-EDGE and ADM-FOG autoencoders.The trained autoencoders were tested on the second dataset.The test dataset has 318 data points collected in a singleday with 10 intentionally caused anomalous events inducedby shaking and overturning the container with the attacheddevice. Since the edge node records both location-based fea-tures (GPS longitude and latitude) and IMU-based features,we can distinguish two types of anomalous events: location-based anomalies (large deviations from learned trajectories)and behaviour-based anomalies (large deviations from learnedIMU signals). Our test dataset does not contain location-basedanomalies.The accuracies of ADM-EDGE and ADM-FOG autoen-coders were assessed by computing the following basic mea-sures: • T P (true positives) – the number of correctly identiﬁedanomalous events, • F P (false positives) – the number of times an autoen-coder indicated a non-existing anomalous event, and • F N (false negatives) – the number of times an autoen-coder missed to indicate an existing anomalous event.We deﬁne the anomalous data points as those that correspondto the intentionally caused incident events; these data pointsare known to the experiment designer and system evaluatorbut are not known beforehand to the AD modules. The goalof AD is then to uncover the deﬁned anomalies from the data.

From

T P , F P and

F N we have derived the precision( P ) and recall ( R ) scores of our anomaly detection models: P = T P/ ( T P + F P ) and R = T P/ ( T P + F N ) . Bothprecision and recall take values in the range [0, 1]. Precisionindicates the degree of correctness of an anomaly detectionmodel: small precision values imply that the model makes alot of errors when stating anomalous events. Recall reﬂects thedegree of model’s ability to detect existing anomalous events.Small recall values indicate that the model often remains”silent” in cases when it should alarm anomalous events.When comparing different anomaly detection models it isuseful to have a single overall score reﬂecting their per-formances. For this purpose we have used the F measurewhich is the harmonic mean of precision and recall: F =2 · P · R/ ( P + R ) .For the ADM-FOG model we have a greater ﬂexibilitythan for the ADM-EDGE model. Thus, in our experimentalevaluation, we have examined a single ADM-EDGE model(see Sec. III-C), 10 ADM-FOG models with three hiddenlayers (sequentially containing n/ , n/ and n/ nodes, where n denotes the number of input features) accepting time-seriesof lengths between L = 1 to L = 10 , and 10 ADM-FOGmodels with ﬁve hidden layers (sequentially containing n/ , n/ , n/ , n/ and n/ nodes) also working with time-seriesof lengths between L = 1 and L = 10 . Due to the stochasticnature of the autoencoder learning algorithm, an ensemble of20 autoencoders was trained for each examined model. Allautoencoders were trained in maximally 200 epochs, with thebatch size equal to 16, the learning rate of the Adam algorithmwas set to 0.001 and early stopping was activated after 10epochs without a decrease in the value of the loss function.The evaluation metrics for a particular model were estimatedby averaging results individually obtained from all autoen-coders in the corresponding ensemble. Additionally, for eachmodel we have examined two variants: a model trained withoutlocation-based features a model trained on all features.The results of the evaluation of the ADM-EDGE autoen-coder in both variants (with and without location-based fea-tures used) are summarized in Table II. It can be seen thatthe ADM-EDGE autoencoder working without location-basedfeatures has a slightly larger precision score and a slightlylower recall score compared to the ADM-EDGE autoencodertrained on all features. However, the observed differences arenot signiﬁcant which is evident by similar values of F scores.This result is expected since the test dataset does not containlocation-based anomalies. Therefore, small differences in theobtained results can be explained by the stochastic nature ofthe autoencoder learning algorithm. The obtained values ofprecision and recall indicate that the ADM-EDGE autoen-coders have a quite good performance. Describing results inmore practical terms, on average, the ADM-EDGE anomalydetection model was able to recognize 8 out of 10 existinganomalous events, it missed 2 real anomalous events andit has 1 or 2 false positive alarms (the average number offalse positives in the NO-GPS case is 1.25, while the averagenumber of false positives in the WITH-GPS case is 1.85).In the second experiment we have examined the perfor-mance of ADM-FOG autoencoders with 3 and 5 hidden layers. TABLE IIE

VALUATION OF

ADM-EDGE

AUTOENCODERS . Evaluation metric NO-GPS WITH-GPS

Precision 0.859 0.814Recall 0.77 0.8 F The obtained F1 scores are presented in Figures 5 and 6. Itcan be seen that ADM-FOG autoencoders exhibit signiﬁcantlyhigher F scores compared to ADM-EDGE autoencoders forall timeseries lengths except for the time-series length equal to L = 1 (i.e., individual data points). The average improvementin the F1 score when ofﬂoading anomaly detection decisionsto ADM-FOG is approximately 7%. Similarly as for ADM-EDGE autoencoders, the location-based features do not havea signiﬁcant impact to the performance of ADM-FOG autoen-coders. The performance of ADM-FOG autoencoders with 3hidden layers is similar to those with 5 hidden layers: thelargest difference in F scores is equal to 0.027 (excludingADM-FOG models working with timeseries of length 1). F s c o r e Timeseries length

Fig. 5. F1 scores of ADM-FOG autoencoders with 3 hidden layers fordifferent timeseries lengths. The dashed lines indicate F1 scores of ADM-EDGE autoencoders. F s c o r e Timeseries length

Fig. 6. F1 scores of ADM-FOG autoencoders with 5 hidden layers fordifferent timeseries lengths.

The results above allow us to explicitly quantify trade- offs between performance of anomaly detection and responsetime, with respect to whether the decision on the presenceof anomalies is carried out at the edge or at the fog. Forthis, note that the response time of ADM-EDGE correspondsapproximately to one sampling period ∆ . On the other hand,the response time of ADM-FOG depends on the length L ofthe time series processed. In the case of ADM-FOG autoen-coders trained without location-based features, the largest F score is achieved by the autoencoder with 3 hidden layersworking on time-series of length L = 9 . The increase inprecision and recall compared to the corresponding ADM-EDGE autoencoder is equal to 0.02 and 0.15, respectively.This means that by increasing the conﬁdence threshold forofﬂoading anomaly detection decisions to the ADM-FOGautoencoder the whole system has less false negative decisionsat the cost of decision delays by L = 9 time slots. The ADM-FOG autoencoder with 5 hidden layers working on time-seriesof length L = 10 has the highest F scores among FOGmodels trained on all features. The increase in precision andrecall in this case is 0.1 and 0.05, respectively. Therefore,by increasing the ofﬂoading threshold the performance of thewhole system improves by having less false positive decisionsat the cost of decision delays by L = 10 time slots.V. C ONCLUSION

In this paper, we present the design, implementation andreal-world deployment and evaluation of a novel anomalydetection architecture for Cellular IoT networks. Our system,tailored for Smart Logistics use case, demonstrated the majorsystem-design trade-offs involving proper balance betweenresponsiveness vs accuracy of deploying anomaly detectionat the edge or in the fog of the Cellular IoT network.R

EFERENCES[1] L. Da Xu, W. He, S. Li, “Internet of things in industries: A survey,” IEEETrans. on Industrial Informatics, Vol. 10, No. 4, pp. 2233–2243, 2014.[2] B. Chen, J. Wan, L. Shu, P. Li, M. Mukherjee and B. Yin, ”Smart Factoryof Industry 4.0: Key Technologies, Application Case, and Challenges,” inIEEE Access, Vol. 6, pp. 6505–6519, 2018.[3] X. Fang, S. Misra, G. Xue and D. Yang, ”Smart Grid — The New andImproved Power Grid: A Survey,” in IEEE Communications Surveys &Tutorials, Vol. 14, No. 4, pp. 944–980, 2012.[4] X. Tang, “Research on Smart Logistics Model Based on Internet of ThingsTechnology,” IEEE Access, Vol. 8, pp. 151150–151159, 2020.[5] K. Sha, W. Wei, T. A. Yang, Z. Wang, W. Shi, ”On security challenges andopen issues in Internet of Things,” Future Generation Computer Systems,Vol. 83, pp. 326–337, 2018.[6] I. Stellios, P. Kotzanikolaou, M. Psarakis, C. Alcaraz and J. Lopez,”A Survey of IoT-Enabled Cyberattacks: Assessing Attack Paths toCritical Infrastructures and Services,” in IEEE Communications Surveys& Tutorials, Vol. 20, No. 4, pp. 3453–3495, 2018.[7] N. Miloslavskaya, A. Tolstoy, ”Internet of Things: information securitychallenges and solutions,” Cluster Comput., Vol. 22, pp. 103—119, 2019.[8] H. Hindy et al., ”A Taxonomy of Network Threats and the Effect ofCurrent Datasets on Intrusion Detection Systems,” in IEEE Access, Vol.8, pp. 104650–104675, 2020.[9] W. Sun, J. Liu, Y. Yue, “AI-enhanced ofﬂoading in edge computing: Whenmachine learning meets industrial IoT,”

IEEE Network , Vol. 33, No. 5,pp.68-74, 2019.[10] M. Zolanvari, M. A. Teixeira, L. Gupta, K. M. Khan and R. Jain,”Machine Learning-Based Network Vulnerability Analysis of IndustrialInternet of Things,”

IEEE Internet of Things Journal , Vol. 6, No. 4, pp.6822–6834, 2019. [11] X. Ma, T. Yao, M. Hu, Y. Dong, W. Liu, F. Wang, J. Liu, “A surveyon deep learning empowered IoT applications,”

IEEE Access , Vol. 7,pp.181721-181732, 2019.[12] R. Chalapathy , S. Chawla, ”Deep Learning for Anomaly Detection: ASurvey”, 2019, https://arxiv.org/abs/1901.03407v2.[13] M.G. Sarwar Murshed, C. Murphy, D. Hou, N. Khan, G. Anantha-narayanan, and F. Hussain, ”Machine Learning at the Network Edge:A Survey”, 2020, https://arxiv.org/pdf/1908.00080.pdf[14] M. Muller, D., Behnke, P.B., Bok, M., Peuster, S. Schneider, H. Karl,“5G as Key Technology for Networked Factories: Application of Vertical-speciﬁc Network Services for Enabling Flexible Smart Manufacturing,”IEEE INDIN 2019, pp. 1495–1500, 2019.[15] Third Generation Partnership Project, Technical Report 45.820 v13.0.0,Cellular System Support for Ultra-Low Complexity and Low ThroughputInternet of Things, 2016.[16] A. Rico-Alvarino, M. Vajapeyam, H. Xu, X. Wang, Y. Blankenship, J.Bergman, T. Tirronen, E. Yavuz, “An overview of 3GPP enhancements onmachine to machine communications,” IEEE Communications Magazine,Vol. 54, No. 6, pp. 14–21, 2016.[17] E. Rastogi, N. Saxena, A. Roy, D.R. Shin, ”Narrowband Internet ofThings: A Comprehensive Study,” Computer Networks, vol. 173, 2020.[18] M.R. Palattella, M. Dohler, A. Grieco, G. Rizzo, J. Torsner, T. Engel,and L. Ladid, “Internet of things in the 5G era: Enablers, architecture, andbusiness models,” IEEE Journal on Selected Areas in Communications,Vol. 34, No. 3, pp.510–527, 2016.[19] A. Burg, A. Chattopadhyay, A. and K.Y. Lam, “Wireless communicationand security issues for cyber–physical systems and the Internet-of-Things,” Proceedings of the IEEE, Vol. 106, No. 1, pp.38–60, 2017.[20] X. Zhang, A. Kunz, S. Schr¨oder, “Overview of 5G security in 3GPP,”IEEE Conference on Standards for Communications and Networking(CSCN), pp. 181–186, 2017.[21] O. Liberg, M. Sundberg, E. Wang, J. Bergman, J. Sachs, “CellularInternet of things: technologies, standards, and performance,” AcademicPress, 2017.[22] Y.-P. Eric Wang, X. Lin, A. Adhikary, A. Grovlen, Y. Sui, Y. Blanken-ship, J. Bergman, H.S. Razaghi, ”A Primer on 3GPP Narrowband Internetof Things,” IEEE Comm. Magazine, vol. 55, no. 3, pp. 117–123, 2017.[23] F. Hussain, R. Hussain, S. A. Hassan and E. Hossain, ”Machine Learningin IoT Security: Current Solutions and Future Challenges,” in IEEEComm. Surveys & Tutorials, Vol. 22, No. 3, pp. 1686–1721, 2020.[24] F. Ullah, H. Naeem, S. Jabbar, S. Khalid, M.A. Latif, F. Al-Turjman,L. Mostarda, “Cyber security threats detection in internet of things usingdeep learning approach,”

IEEE Access , Vol. 7, pp. 124379-124389, 2017.[25] Z. Zhou, X. Chen, E. Li, L. Zeng, K. Luo and J. Zhang, ”EdgeIntelligence: Paving the Last Mile of Artiﬁcial Intelligence with EdgeComputing,”, Proc. of the IEEE, Vol. 107, No. 8, pp. 1738 – 1762, 2019.[26] Y. He, J. Lin, Z. Liu, H. Wang, L.-J. Li and S. Han, ”Amc: Automlfor model compression and acceleration on mobile devices,” EuropeanConference on Computer Vision, Springer, pp. 815-–832, 2018.[27] B. Zoph and Q. V. Le, ”Neural architecture search with reinforcementlearning,”, Proceedings of the International Conference on LearningRepresentations, Toulon, France, 2019[28] S. Y. Nikouei, Y. Chen, S. Song, R. Xu, B. Choi and T. Faughnan,”Smart Surveillance as an Edge Network Service: From Harr-Cascade,SVM to a Lightweight CNN,” Int’l Conference on Collaboration andInternet Computing (CIC), pp. 256-265, 2018.[29] Z. Tian, C. Luo, J. Qiu, X. Du, and M. Guizani, “A distributed deeplearning system for web attack detection on edge devices,” IEEE Trans.on Industrial Informatics, Vol. 16, No. 3, pp.1963–1971, 2019.[30] T. D. Nguyen, S. Marchal, M. Miettinen, H. Fereidooni, N. Asokan andA. Sadeghi, ”D¨IoT: A Federated Self-learning Anomaly Detection Systemfor IoT,” IEEE ICDCS 2019, pp. 756–767, 2019.[31] S. Wang, T. Tuor, T. Salonidis, K. K. Leung, C. Makaya, T. He, K. Chan,“Adaptive Federated Learning in Resource Constrained Edge ComputingSystems,”, IEEE Journal on Selected Areas in Communications, Vol. 37,No. 6, pp. 1205 – 1221, 2019.[32] A. Ferdowsi and W. Saad, ”Generative Adversarial Networks for Dis-tributed Intrusion Detection in the Internet of Things,” IEEE GLOBE-COM 2019, pp. 1–6, 2019.[33] Y. Meidan et al., ”N-BaIoT—Network-Based Detection of IoT BotnetAttacks Using Deep Autoencoders,” in IEEE Pervasive Computing, Vol.17, No. 3, pp. 12–22, 2018.[34] V. H. Bezerra, V. da Costa, S.B. Junior, R.S. Miani,B.B. Zarpel˜ao,”IoTDS: A One-Class Classiﬁcation Approach to Detect Botnets inInternet of Things Devices,” Sensors, Vol. 19, No. 14, 2019. [35] S. Rathore, J. H. Park, ”Semi-supervised learning based distributedattack detection framework for IoT,” App. Soft Comp., Vol. 72, pp. 79–89,2018.[36] R. Doshi, N. Apthorpe and N. Feamster, ”Machine Learning DDoSDetection for Consumer Internet of Things Devices,” IEEE Security andPrivacy Workshops SPW 2018, pp. 29–35, 2018.[37] H. HaddadPajouh, A. Dehghantanha, R. Khayami, K.R. Choo, ”A deepRecurrent Neural Network based approach for Internet of Things malwarethreat hunting,” Future Generation Comp. Syst., Vol. 85, pp. 88–96, 2018.[38] J. Schneible and A. Lu, ”Anomaly detection on the edge,” IEEEMILCOM 2017, pp. 678–682, 2017.[39] K. Kamaraj, B. Dezfouli and Y. Liu, ”Edge Mining on IoT Devices Us-ing Anomaly Detection,” Asia-Paciﬁc Signal and Information ProcessingAssociation Annual Summit and Conference, pp. 33–40, 2019.[40] B. Martinez, F. Adelantado, A. Bartoli and X. Vilajosana, ”Exploring thePerformance Boundaries of NB-IoT,” in IEEE Internet of Things Journal,6(3), pp. 5702-5712, 2019.[41] B. Vejlgaard, M. Lauridsen, H. Nguyen, I.Z. Kov´acs, P. Mogensen, M.Sorensen, “Coverage and capacity analysis of sigfox, lora, gprs, and nb-iot,”

IEEE VTC Spring , pp. 1-5, 2017.[42] R. Ratasuk, N. Mangalvedhe, D. Bhatoolaul, “Coexistence Analysis ofLTE eMTC and 5G New Radio,” IEEE PIMRC 2019.[43] A. Shamsoshoara, A. Korenda, F. Afghah, S. Zeadally, “A surveyon hardware-based security mechanisms for internet of things,” arXivpreprint arXiv:1907.12525, 2019.[44] D. P. Kingma and J. Ba, ”Adam: A Method for Stochastic Optimization”,arXiv:1412.6980, 2014[45] A. Jaddoa, G. Sakellari, E. Panaousis, G. Loukas, and P.G. Sarigiannidis,“Dynamic decision support for resource ofﬂoading in heterogeneousInternet of Things environments,”