Deep Learning Anomaly Detection for Cellular IoT with Applications in Smart Logistics
Milos Savic, Milan Lukic, Dragan Danilovic, Zarko Bodroski, Dragana Bajovic, Ivan Mezei, Dejan Vukobratovic, Srdjan Skrbic, Dusan Jakovetic
11 Deep Learning Anomaly Detection for Cellular IoTwith Applications in Smart Logistics
Milos Savic, Milan Lukic, Dragan Danilovic, Zarko Bodroski, Dragana Bajovic
Member , Ivan Mezei
SeniorMember , Dejan Vukobratovic
Senior Member , Srdjan Skrbic and Dusan Jakovetic
Member
Abstract —The number of connected Internet of Things (IoT)devices grows at an increasing rate, revealing shortcomings ofcurrent IoT networks for cyber-physical infrastructure systemsto cope with ensuing device management and security issues.Data-based methods rooted in deep learning (DL) are recentlyconsidered to cope with such problems, albeit challenged bydeployment of deep learning models at resource-constrainedIoT devices. Motivated by the upcoming surge of 5G IoTconnectivity in industrial environments, in this paper, we proposeto integrate a DL-based anomaly detection (AD) as a serviceinto the 3GPP mobile cellular IoT architecture. The proposedarchitecture embeds deep autoencoder based anomaly detectionmodules both at the IoT devices (ADM-EDGE) and in the mobilecore network (ADM-FOG), thereby balancing between the systemresponsiveness and accuracy. We design, integrate, demonstrateand evaluate a testbed that implements the above service in areal-world deployment integrated within the 3GPP Narrow-BandIoT (NB-IoT) mobile operator network.
Index Terms —Anomaly Detection, Cellular IoT, Industrial IoT,Machine Learning, Smart Logistics
I. I
NTRODUCTION T HE proliferation of Internet of Things (IoT) and de-ployment of massive amount of IoT devices in cyber-physical infrastructure systems such as Smart Factories [1],[2], Smart Grids [3], Smart Logistics [4] and others, broughtforward increasing number of cyber-security [5] and propertymanagement challenges [6]. For example, Smart Factory orSmart Logistics operations include asset management, intelli-gent manufacturing, performance optimization and monitoring,planning, human-machine interaction, all of which are notdesigned with cyber-security protection or data managementof Industrial IoT scale [7], [8]. Handling massive IoT devicedata integrity and device behaviour in real-time industrialIoT operation and management requires novel approacheswhich are, in recent research, mainly addressed using machine-learning (ML) and deep-learning (DL) techniques [9]–[11].The ability of ML/DL algorithms to process massive data setswhile extracting useful features allow them to quickly identifyanomalies and prevent breakdowns, which has potentiallybroad application space in cyber-physical systems [12], [13].
Milos Savic, Zarko Bodroski, Srdjan Skrbic and Dusan Jakovetic arewith Faculty of Sciences, University of Novi Sad, Serbia, e-mail: { svc,zarko.bodroski, srdjan.skrbic, dusan.jakovetic } @dmi.uns.ac.rs.Milan Lukic, Dragana Bajovic, Ivan Mezei and Dejan Vukobratovic arewith the Faculty of Technical Sciences, University of Novi Sad, Novi Sad,Serbia, e-mail: { milan lukic, dbajovic, imezei, dejanv } @uns.ac.rs.Dragan Danilovic is with the VIP Mobile, Bul. Multina Milankovica 1z,Belgrade, Serbia, e-mail: [email protected] work is supported in part by European Commission’s Horizon 2020Research and Innovation Programme, Grant No. 833828. With the introduction of 5 th generation (5G) cellular net-works, IoT cyber-physical infrastructure systems are becomingincreasingly reliant on cellular networks [14]. 3GPP standard-ization initiated work on support for Cellular IoT (CIoT) dur-ing the 4G Long-Term Evolution (4G LTE) development [15],which resulted in first CIoT technologies such as Narrow-BandIoT (NB-IoT) being introduced in 3GPP Release 13 [16], [17].This work has since then expanded to Ultra-Reliable Low-Latency Communications (URLLC) and massive Machine-Type Communications (mMTC) services in 5G [18]. As bil-lions of new CIoT devices are expected to be connected world-wide in the following years, providing efficient and automatedmonitoring and threat detection both at the CIoT devicesand within the CIoT network architecture will be critical tosecurely manage devices and cover this attack surface [19],[20].In this paper, we propose to augment the 3GPP mobilecellular architecture with additional enhancements that providesupport for network-wide anomaly detection (AD) service. Ourtarget is a generic AD CIoT service which can be tailored toapplications ranging from identifying malfunctioning devicesto threat detection for secure CIoT. The proposed hierarchicalAD architecture embeds anomaly detection modules (ADMs)both at the IoT devices (ADM-EDGE) and in the mobile corenetwork (ADM-FOG). The ADM modules are based on deepautoencoders (AE) whose complexity is matched to both theedge and the fog deployment, balancing between the systemresponsiveness and accuracy. The distinguishing feature of ourwork is that the proposed AD enhancement of CIoT architec-ture, including both ADM-EDGE and ADM-FOG modules,is implemented and deployed in a real-world CIoT networkbased on 3GPP NB-IoT standard and demonstrated in thecontext of Smart Logistics. Moreover, we custom-designed anovel NB-IoT device platform for Smart Logistics use case,where NB-IoT devices are connected to shipping containersin a factory supply chain, in order to collect data, deploy andtest the ADM-EDGE module.The paper is organized as follows. In Sec. II, we providetechnical background, review the related work and present thecontributions of this paper. The proposed solution for DL-based anomaly detection in CIoT is presented in detail in Sec.III. In Sec. IV, we describe system integration, data generationand provide numerical results from real-world experiments.The paper is concluded in Sec. V. a r X i v : . [ c s . N I] F e b Fig. 1. 3GPP CIoT architecture augmented with Anomaly Detection enhancements.
II. B
ACKGROUND
In this work, we augment the CIoT architecture withanomaly detection capabilities at the IoT devices (edge) andthe mobile core network servers (fog). Before going to details,we first provide the technical background needed for under-standing the proposed system architecture and functionality.
A. 3GPP Cellular IoT Architecture
We start by describing current state-of-the-art CIoT ar-chitecture focusing primarily on 3GPP NB-IoT technology[15], [16]. NB-IoT is a new CIoT technology that can beseamlessly integrated in existing 3GPP 4G/5G architecture,coexisting in the radio access network with the current 3GPP4G LTE and the emerging 3GPP 5G NR technology, and usingthe same evolved packet core (EPC) network functionalities[22]. Focusing on the current 3GPP 4G LTE architecture,relevant 3GPP CIoT architecture elements are illustrated inFig. 1. CIoT user equipment (CIoT UE), which is a formalname for NB-IoT device, connects to the network via aneighbouring base station or eNodeB (eNB), which is themain element of Evolved Universal Terrestrial Radio AccessNetwork (E-UTRAN). NB-IoT downlink/uplink resources areallocated either within 4G LTE band (in-band deployment),at its edge (guard-band deployment), or as a separate channel(out-of-band deployment). After eNB, both user-plane (i.e.,user data packets) and control-plane (i.e., signalling messages)information is processed at CIoT Serving Gateway Node(C-SGN), which covers functionalities of both control-planeMobility Management Entity (MME) and user-plane ServingGateway (SGW). User-plane data further flows through PacketGateway (PGW) to the IoT platform, which forwards data viathe Internet to the external network application servers [21].Two options for data transfer between the CIoT UE and theIoT platform are envisioned. The first one (mandatory) usessignalling radio bearers to transmit user data, thus avoidingestablishment of data radio bearers for energy efficiency. FromeNB, data is routed either following a control-plane path viaan EPC element called Service Capability Exposure Function (SCEF) for non-IP data, or a user-plane path via C-SGNand PGW for both IP/non-IP data. The second one (optional)establishes a data radio bearer to send IP/non-IP data via aneNB/C-SGN/PGW user-plane path to the IoT platform. Herein,we assume that a UDP encapsulated IP data from CIoT UEdevice traverses the path following the latter approach, whichwill impact the deployment choices for the proposed anomalydetection enhancements strategy described in Sec. III.
B. Machine Learning for Anomaly Detection at the Edge
Security challenges and threats in industrial IoT networkscall for innovative applications of ML/DL techniques for IoTsecurity. More specifically, these techniques can be employedfor authentication and access control, anomaly and intrusiondetection, malware analysis and distributed denial-of-service(DDoS) attacks detection and mitigation [23], [24]. The mainchallenges of implementing ML/DL models at the edge arescalability issues and IoT edge platforms resource limitations[13]. Depending on the ML algorithm being run on the edgenode, the size of the ML model can go as low as a fewkilobytes. Also, the requirements in regard to the memorycapacity and computational power depend heavily on thechoice whether the models are trained on the edge, or pre-trained models are being used.Besides the sensor readouts, which are the primary source ofdata for ML/DL at the edge, the IoT module itself can providea host of useful insights about the network and wireless linkconditions, the feature we also exploit in our edge devicedesign described in Sec. III-B. The amount of useful datathat can be extracted from the IoT module generally exceedsthe capacity of the wireless communication channel, however,this kind of metadata can be used to feed a locally run MLalgorithm for anomaly detection, or be aggregated and sent tothe core network fog gateway periodically, for further analysis.In this work, to perform AD, we apply deep autoencoders(AE). AE is a neural network that learns a latent lower-dimensional representation of training data by reproducingits inputs through latent variables in the hidden layers at theoutput layer with the smallest possible error. The error function captures differences between values at the input and outputlayer. This so-called reconstruction error is used as the outlierscore in an anomaly detection process. The proposed ADarchitecture is hierarchical, as it comprises AD models runningat different levels within an CIoT system (both IoT edgedevices and core network fog gateway), where more powerfulhigher-level models are activated if decisions of lower-levelmodels have low confidence scores (see Sec. III-C for details).
C. Related Work
Recent research efforts in the area of ML methods foranomaly detection at the edge IoT devices have been focusedon efficient utilization of the limited computation resourcesat the edge. It is well-known that the training process formost of the deep learning-based AI models is highly resource-intensive, usually requiring hardware resources (e.g., GPU,FPGA) [25]. Resource-aware edge AI model designs havebeen considered in a different line of research. The AutoMLidea [26] and the Neural Architecture Search techniques [27]have been used to devise resource-efficient edge AI modelstailored to the hardware resource constraints of both the un-derlying edge devices and network servers. Important researchadvances were also made regarding the tailored design of DLarchitectures for resource-constrained devices: Zhang et al.proposed an extremely efficient convolutional neural network(CNN) for mobile devices and Nikouei et al. introduced alightweight CNN that can run on edge devices [28].A number of proposals using distributed ML/DL for securityin Industrial IoT are recently considered [29]. In DIoT, arecurrent neural network (RNN) is trained for each devicetype present in the IoT network to learn a normal commu-nication profile. A federated (distributed) learning scheme isemployed to learn device-type specific RNNs [30]. Wang et al.proposed a control algorithm that determines the best trade-off between local update and global parameter aggregationin data partitioned federated learning models trained usinggradient-descent algorithms [31]. Ferdowsi and Saad proposeda distributed privacy preserving IoT intrusion detection secu-rity system based on federated generative adversarial networks.In the proposed decentralized architecture, every IoT devicemonitors its own data as well as neighbor IoT devices to detectinternal and external attacks [32]. Meidan et al. proposed N-BaIoT – a method for detecting IoT botnet attacks based ondeep autoencoders. For each device present in a IoT network, adeep autoencoder is trained on features extracted from normaltraffic data [33]. Bezerra et al. proposed IoTDS – a distributedmethod for detecting IoT botnet attacks based on light-weightone-class classification models [34]. Rathore and Park createda decentralized attack detection framework for IoT networksbased on semi-supervised learning employing extreme learn-ing machines and fuzzy C-means algorithms [35]. Doshi etal. employed various machine learning algorithms (k-nearestneighbor, support vector machines, decision trees and neuralnetworks) to detect DDoS attack traffic in consumer IoTdevices [36]. Pajouh et al. (2018) proposed a malware detec-tion approach for IoT based on deep RNNs [37], while [38]presents an approach to anomaly detection that implements autoencoders at each edge device, while the edge devices areorchestrated via a federated learning model with the centralserver. In [39], authors show that Random Forest, MultilayerPerceptron, and Discriminant Analysis models can viably savetime and energy on the edge device during data transmission,while K-Nearest Neighbors, although reliable in terms ofprediction accuracy, is resource-inefficient in their studies.
D. Contributions
We now summarize the main contributions of the paper. Wepropose an approach to embed anomaly detection capabilitiesin the Cellular IoT architecture, providing for combined threatdetection both at the IoT devices (edge) and in the mobilecore network servers (fog). The corresponding architecturedesign is motivated by and well-suited for Smart Logistics.The proposed edge-based ADM-EDGE and fog-based ADM-FOG modules can balance between the responsiveness andaccuracy by employing deep autoencoder (AE) based learningmodules whose complexity is matched to both edge and fogdeployment. We carry out implementation, integration, andevaluation of an end-to-end testbed according to the proposedarchitecture. This includes: 1) real IoT data generation andemulation of a real-world Smart Logistics scenario; 2) fabrica-tion and configuration of the relevant edge and fog hardwareand infrastructure; 3) development and implementation of asoftware library for edge and fog-based anomaly detection;and 4) evaluation of the developed anomaly detectors on thegenerated data and quantification of detection performance-response time tradeoffs. For the latter contribution, we ex-plicitly quantify the tradeoffs that take into account limitedcomputational and storage budget at the edge devices, andcommunication and processing costs due to processing largeramounts of data at the fog for improved AD performance.III. DL-B ASED A NOMALY D ETECTION IN O TIn this section, we describe in detail the design and systemarchitecture of the proposed AD support for the 3GPP NB-IoTmobile cellular network.
A. System Model and Architecture
We augment 3GPP CIoT system architecture with supportfor CIoT device anomaly detection. Augmented architecture isillustrated in Fig. 1 and introduces two additional ADMs: oneplaced at the edge CIoT UE (ADM-EDGE) and another placedat the fog gateway (ADM-FOG). The architecture representsgeneric CIoT enhancement for anomaly detection, althoughin this work, we specialize it to the domain of Smart Lo-gistics. This includes managing supply of items from variousorigin points delivered to warehouses in manufacturing plants(Fig. 1). Items being delivered are packed into containers, eachof which has an NB-IoT device attached. For this purpose, wedesigned an entirely new NB-IoT UE device, and deployedsuitable ADM-EDGE and ADM-FOG modules at both NB-IoT UEs and the FGW server within the mobile core network. Response time is the time passed from the occurrence of an anomaly toits detection
Fig. 2. 3GPP CIoT Anomaly Detection processing flow.
ADM-EDGE:
As described below, NB-IoT devices collectvarious information such as acceleration and GPS coordinates.This sensory information can be used to detect anomaliessuch as physical tampering of items, container mishandlingsuch as overturning, delays, routing problems, incidents withthe delivery vehicles, etc. We assume each NB-IoT devicepossesses two types of sensors: i) sensor S1 with low samplingrate f [Hz] and sampling period ∆ = f [s] (in our case,we consider GPS sensor that samples the outdoor devicelocation), and 2) sensor S2 with high sampling rate f [Hz]and sampling period ∆ = f [s] (in our case, we consideraccelerometer/gyroscope that samples vibration monitoringparameters), as illustrated in Fig. 2.Due to limited memory capacity and processing power,ADM-EDGE integrated into an NB-IoT device firmware re-quires restrictive design. ADM-EDGE consists of a pre-trainedautoencoder with a single hidden layer. At the input, ADM-EDGE processes a single data point that consists of a singleS1 and S2 value. As illustrated in Fig. 2, we assume ADM-EDGE is triggered synchronously with the low-rate sensor S1outputs X S [ k ] = X S ( t = k ∆ ) , k = { , , . . . } , where ∆ is the sampling period of the sensor S1 output function X S ( t ) .Besides an S1 sample, ADM-EDGE is fed with the sensor S2value X S [ k ] , which is a root mean square (RMS) aggregatevalue of high-rate sensor S2 output samples calculated overthe interval of duration ∆ between the last two S1 outputs.In other words, X S [ k ] = (cid:113) M (cid:80) (cid:96) X S ( t = (cid:96) ∆ ) , where (cid:96) satisfies ( k − < (cid:96) ∆ ≤ k ∆ , which amounts to the last M = ∆ ∆ S2 samples preceding t = k ∆ . To summarize,a pair of S1 and aggregated S2 values ( X S [ k ] , X S [ k ]) represents a data point fed into an ADM-EDGE autoencoderevery ∆ [s]. For each decision, after ADM-EDGE processingtime, the device outputs a confidence score (see Sec. III-C). ADM-FOG:
NB-IoT devices connect to a mobile networkand transfer data via the nearest base station. Each ADM-EDGE data point is forwarded to the FGW, adjoined withthe ADM-EDGE confidence score evaluated from the lastavailable data point. The communication delay incurred byNB-IoT network connection may vary between the orderof tens-of-milliseconds to several tens-of-seconds, dependingon the NB-IoT device radio conditions and network load.FGW server runs an instance of ADM-FOG relying on highermemory capacity and processing power. Thus ADM-FOG uses more powerful autoencoder processing multi-variate timeseries through several hidden layers. Larger input is consideredwhich is formed by concatenating the last L ADM-EDGE datapoints (see Fig. 2). Thus at the time instant t k when the k -thdata point is received at the FGW (note that t k = k ∆ + τ k ,where τ k is communication delay of the k -th data point), theADM-FOG is triggered with the input containing the set of thelast L data points { ( X S [ i ] , X S [ i ]) } k − L
We designed the NB-IoT edge device illustrated in Fig. 3having in mind the specific requirements of a Smart Logisticsenvironment: tracking and monitoring the vibration of theshipping containers. Here, we reflect on the most importantfeatures supported by our device.
1) Cellular connectivity:
To fulfill the requirement forubiquitous connectivity, while keeping the power consumptionof the battery-powered device low, we utilize a BG96 cellularmodule from Quectel, which supports NB-IoT and LTE-M,as state-of-the-art 3GPP CIoT communication standards, thatwill be further evolved in 5G standardization [42]. In addition,EGPRS is supported to ensure the connectivity in areas whereLTE carrier might not be available. Finally, the integrated
Fig. 3. 3GPP NB-IoT/LTE-M edge node running ADM-EDGE model.
GNSS module provides the geolocation information which isessential to the asset tracking task in the logistics use case.The intention is to use NB-IoT as the primary means ofcommunication due to its desirable properties, namely energyefficiency combined with extended coverage [41]. However,in occasions when it is necessary to transfer larger amountsof data, (e.g. a new firmware image), LTE-M is more efficientsolution. The architecture of our edge node provides flexibilitywhich allows us to adapt the throughput of the communicationmodule according to the needs of the application.
2) On-board sensors:
Apart from the localization dataprovided by the GNSS module, the on-board environmentalsensors are used to measure parameters relevant to the logis-tics use case. The 6-axis Inertial Measurement Unit (IMU)provides information about the vibrations and the magneticfield along X, Y and Z axes relative to the chip position. Theadditional set of sensors is used to measure the atmosphericconditions such as air temperature, pressure and humidity.The designed platform provides additional metadata thatcould be used as inputs to ADM-EDGE. For example, thecellular modem is capable of providing the standard set ofradio condition metrics (SNR, RSSI, RSRP, etc.). In addition,our design includes the on-board current measuring circuitrythat allows the micro-controller unit (MCU) to acquire precisemeasurements of the power consumption by BG96 module.
3) The MCU features and capabilities:
The main MCUinside edge node is a low-power 32-bit ARM Cortex M0+ with256KB of FLASH and 32KB of SRAM, operating at 16MHz.The MCU resources are sufficient to efficiently control therest of the circuitry, while maintaining the low power con-sumption, especially in the sleep mode. However, the absenceof operating system as well as the hardware constraints limitthe usage of ML tools only to lightweight models that are fullycustomized and optimized for a given application. Finally, anexternal FLASH memory module enables data logging overthe intervals when there is no connectivity, and is used tostore the firmware images during over-the-air updates.
4) Security:
In an industrial setup, the security is of thecritical importance. Thereby, we use hardware crypto elementwhich enables offloading the computationally expensive asym-metric cryptographic algorithms (ECC and RSA) from theresource-constrained MCU [43]. Tampering-resistant memorywithin the crypto chip is used to store security credentials,making FW on the host MCU oblivious of the sensitiveinformation such as the encryption keys and certificates.
C. Anomaly Detection using ADM-EDGE and ADM-FOG
ADM-EDGE and ADM-FOG detect anomalies using au-toencoders. Let us assume that the device behaviour is de-scribed by a feature vector X containing k real-valued fea-tures. Those may be values observed at one particular pointin time or multivariate time series. Let D denote a set ofdata points that depicts the normal (nominal) behaviour ofthe device (the training dataset), let A ( D ) be an autoencodertrained on D , and let e denote the maximal error of A on D .Then, a data point y not contained in D (a data point thatis not present in the training dataset) can be considered asan anomaly if the difference between y and A ( y ) , computedby the same error function that was used for training theautoencoder, is higher than e , where A ( y ) denotes the outputof A for y .ADM-EDGE and ADM-FOG autoencoders identify anoma-lies according to the previously described rule. For eachanomaly detection decision, the confidence score C ( y ) iscomputed according to the following formula: C ( y ) = σ ( Err { y, A ( y ) } − e ) , (1)where Err {·} is the error function used to train A (e.g., themean squared error) and σ denotes the sigmoid function. Theimportant property of the confidence score function is thatnon-anomalous data points have scores in the range (0, 0.5],whereas anomalous data points exhibit scores higher in theinterval (0.5, 1). In other words, confidence scores close to0 indicate non-anomalous data points, while values close to 1signify anomalies. Thus, confidence scores for non-anomalousdata points after making decision are further transformed into − C , where C is a value obtained by Eq. (1).ADM-EDGE autoencoders have a predefined structure witha single hidden layer containing n/ nodes, where n is thenumber of input features. They use the ReLU activationfunction for the hidden layer. Additionally, bias variablesare not considered for internal nodes. Due to constraintsof NB-IoT devices, the training of lightweight autoencodersis performed offline using a Python module utilizing theTensorflow library. This ADM module determines lightweightautoencoder weights by optimizing the mean squared errorusing the Adam optimizer [44] for a given number of epochsand batch size. Before training, data points in the input trainingdataset are normalized such that each feature has zero meanand unit variance. The weights of the trained model and datanormalization parameters are then exported to textual files.An inference function performing anomaly detection on a pre-trained lightweight autoencoder is implemented in C withoutrelying on any external library. This inference function isdirectly integrated into the firmware of our NB-IoT devices. Decisions made by ADM-EDGE lightweight autoencodersare re-evaluated by ADM-FOG autoencoders in case of lowconfidence scores. The default value of the threshold is set to C th = 0 . , i.e., the decisions with C < C th are re-evaluated.We adopt here a standard, confidence-score based decisionthat is simple but effective; for more advanced mechanismson how to offload decisions from the edge, see, e.g., [45].The threshold C th is a tunable parameter that allows to trade-off confidence in the decision about anomaly and responsetime. Lower threshold corresponds to the system designer’ssatisfaction with lower confidence scores, but the averageresponse time within a time interval for the same inputdata set is decreased. In contrast to ADM-EDGE lightweightautoencoders, ADM-FOG autoencoders may have an arbitrarynumber of hidden layers. Additionally, they process multivari-ate time series constructed using the sliding window approachinstead of single data points.IV. S YSTEM I NTEGRATION , D
ATA G ENERATION AND N UMERICAL R ESULTS
A. System Integration
To integrate the system, collect real-world data and performtesting and evaluation, CIoT UE is connected to the FGW viaa mobile operator macro-cellular NB-IoT eNB. CIoT UE isrunning ADM-EDGE software module and periodically sendsdata points to the FGW encapsulated into UDP packets. Withinthe mobile operator core network, the general purpose server isset and connected to the PGW gateway. ADM-FOG softwaremodule within the server accepts UDP packets sent by CIoTUE. The server provides sufficient resources to run ADM-FOG module, so in the sequel, we focus on the ADM-EDGEmodule deployment on the CIoT UE device.To estimate the resource utilization of ML/DL ADM-EDGEmodel in terms of memory footprint the following results aregiven in Table I. One can note that ADM-EDGE consumes asmall fraction of standard NB-IoT device firmware needed forbasic device sensing, processing and communication function-ality. Tensorflow and Tensorflow lite exported models sizes arealso given for reference.
TABLE IADM-EDGE
MEMORY RESOURCE UTILIZATION . MODEL Size in bytes
Firmware without ADM-EDGE 55816 (21,3%) out of 262144Firmware with ADM-EDGE 61896 (23,6%) out of 262144ADM-EDGE only 6080 ( ∼ B. Data Generation
To generate the dataset (elaborated in Section IV-C), weused NB-IoT edge nodes described in Section III-B. Wecreated a setup where an edge node has been attached to a box-shaped container inside a transport vehicle moving throughthe city of Novi Sad. The device was initially connected tothe NB-IoT network, and it had the uninterrupted connectivity throughout the path. We collected the positioning data fromGNSS module (timestamp, latitude, longitude, altitude, speedand number of satellites in range), as well as the outputs of theIMU (acceleration and magnetic field along the 3 spatial axes).The time resolution of the GNSS samples was ∆ = 10 s.The sampling rate of the IMU is ∆ = 15 ms (see Fig.4 for an example of IMU signals), thus we calculated theRMS for the acceleration and magnetic field samples collectedwithin a sampling interval ∆ (as described in Sec. III.A). Thecollected data was stored at the database at the FGW, and wereused to train the AD model discussed in the following section. Fig. 4. Example of acceleration data from IMU.
C. Numerical Results
ADM-EDGE and ADM-FOG autoencoders were evaluatedusing two independent datasets. The first dataset reflects thebehaviour of the edge node device under normal drivingconditions without large disturbances. This dataset contains1470 data points collected in a period of three days and itis used to train ADM-EDGE and ADM-FOG autoencoders.The trained autoencoders were tested on the second dataset.The test dataset has 318 data points collected in a singleday with 10 intentionally caused anomalous events inducedby shaking and overturning the container with the attacheddevice. Since the edge node records both location-based fea-tures (GPS longitude and latitude) and IMU-based features,we can distinguish two types of anomalous events: location-based anomalies (large deviations from learned trajectories)and behaviour-based anomalies (large deviations from learnedIMU signals). Our test dataset does not contain location-basedanomalies.The accuracies of ADM-EDGE and ADM-FOG autoen-coders were assessed by computing the following basic mea-sures: • T P (true positives) – the number of correctly identifiedanomalous events, • F P (false positives) – the number of times an autoen-coder indicated a non-existing anomalous event, and • F N (false negatives) – the number of times an autoen-coder missed to indicate an existing anomalous event.We define the anomalous data points as those that correspondto the intentionally caused incident events; these data pointsare known to the experiment designer and system evaluatorbut are not known beforehand to the AD modules. The goalof AD is then to uncover the defined anomalies from the data.
From
T P , F P and
F N we have derived the precision( P ) and recall ( R ) scores of our anomaly detection models: P = T P/ ( T P + F P ) and R = T P/ ( T P + F N ) . Bothprecision and recall take values in the range [0, 1]. Precisionindicates the degree of correctness of an anomaly detectionmodel: small precision values imply that the model makes alot of errors when stating anomalous events. Recall reflects thedegree of model’s ability to detect existing anomalous events.Small recall values indicate that the model often remains”silent” in cases when it should alarm anomalous events.When comparing different anomaly detection models it isuseful to have a single overall score reflecting their per-formances. For this purpose we have used the F measurewhich is the harmonic mean of precision and recall: F =2 · P · R/ ( P + R ) .For the ADM-FOG model we have a greater flexibilitythan for the ADM-EDGE model. Thus, in our experimentalevaluation, we have examined a single ADM-EDGE model(see Sec. III-C), 10 ADM-FOG models with three hiddenlayers (sequentially containing n/ , n/ and n/ nodes, where n denotes the number of input features) accepting time-seriesof lengths between L = 1 to L = 10 , and 10 ADM-FOGmodels with five hidden layers (sequentially containing n/ , n/ , n/ , n/ and n/ nodes) also working with time-seriesof lengths between L = 1 and L = 10 . Due to the stochasticnature of the autoencoder learning algorithm, an ensemble of20 autoencoders was trained for each examined model. Allautoencoders were trained in maximally 200 epochs, with thebatch size equal to 16, the learning rate of the Adam algorithmwas set to 0.001 and early stopping was activated after 10epochs without a decrease in the value of the loss function.The evaluation metrics for a particular model were estimatedby averaging results individually obtained from all autoen-coders in the corresponding ensemble. Additionally, for eachmodel we have examined two variants: a model trained withoutlocation-based features a model trained on all features.The results of the evaluation of the ADM-EDGE autoen-coder in both variants (with and without location-based fea-tures used) are summarized in Table II. It can be seen thatthe ADM-EDGE autoencoder working without location-basedfeatures has a slightly larger precision score and a slightlylower recall score compared to the ADM-EDGE autoencodertrained on all features. However, the observed differences arenot significant which is evident by similar values of F scores.This result is expected since the test dataset does not containlocation-based anomalies. Therefore, small differences in theobtained results can be explained by the stochastic nature ofthe autoencoder learning algorithm. The obtained values ofprecision and recall indicate that the ADM-EDGE autoen-coders have a quite good performance. Describing results inmore practical terms, on average, the ADM-EDGE anomalydetection model was able to recognize 8 out of 10 existinganomalous events, it missed 2 real anomalous events andit has 1 or 2 false positive alarms (the average number offalse positives in the NO-GPS case is 1.25, while the averagenumber of false positives in the WITH-GPS case is 1.85).In the second experiment we have examined the perfor-mance of ADM-FOG autoencoders with 3 and 5 hidden layers. TABLE IIE
VALUATION OF
ADM-EDGE
AUTOENCODERS . Evaluation metric NO-GPS WITH-GPS
Precision 0.859 0.814Recall 0.77 0.8 F The obtained F1 scores are presented in Figures 5 and 6. Itcan be seen that ADM-FOG autoencoders exhibit significantlyhigher F scores compared to ADM-EDGE autoencoders forall timeseries lengths except for the time-series length equal to L = 1 (i.e., individual data points). The average improvementin the F1 score when offloading anomaly detection decisionsto ADM-FOG is approximately 7%. Similarly as for ADM-EDGE autoencoders, the location-based features do not havea significant impact to the performance of ADM-FOG autoen-coders. The performance of ADM-FOG autoencoders with 3hidden layers is similar to those with 5 hidden layers: thelargest difference in F scores is equal to 0.027 (excludingADM-FOG models working with timeseries of length 1). F s c o r e Timeseries length
Fig. 5. F1 scores of ADM-FOG autoencoders with 3 hidden layers fordifferent timeseries lengths. The dashed lines indicate F1 scores of ADM-EDGE autoencoders. F s c o r e Timeseries length
Fig. 6. F1 scores of ADM-FOG autoencoders with 5 hidden layers fordifferent timeseries lengths.
The results above allow us to explicitly quantify trade- offs between performance of anomaly detection and responsetime, with respect to whether the decision on the presenceof anomalies is carried out at the edge or at the fog. Forthis, note that the response time of ADM-EDGE correspondsapproximately to one sampling period ∆ . On the other hand,the response time of ADM-FOG depends on the length L ofthe time series processed. In the case of ADM-FOG autoen-coders trained without location-based features, the largest F score is achieved by the autoencoder with 3 hidden layersworking on time-series of length L = 9 . The increase inprecision and recall compared to the corresponding ADM-EDGE autoencoder is equal to 0.02 and 0.15, respectively.This means that by increasing the confidence threshold foroffloading anomaly detection decisions to the ADM-FOGautoencoder the whole system has less false negative decisionsat the cost of decision delays by L = 9 time slots. The ADM-FOG autoencoder with 5 hidden layers working on time-seriesof length L = 10 has the highest F scores among FOGmodels trained on all features. The increase in precision andrecall in this case is 0.1 and 0.05, respectively. Therefore,by increasing the offloading threshold the performance of thewhole system improves by having less false positive decisionsat the cost of decision delays by L = 10 time slots.V. C ONCLUSION
In this paper, we present the design, implementation andreal-world deployment and evaluation of a novel anomalydetection architecture for Cellular IoT networks. Our system,tailored for Smart Logistics use case, demonstrated the majorsystem-design trade-offs involving proper balance betweenresponsiveness vs accuracy of deploying anomaly detectionat the edge or in the fog of the Cellular IoT network.R
EFERENCES[1] L. Da Xu, W. He, S. Li, “Internet of things in industries: A survey,” IEEETrans. on Industrial Informatics, Vol. 10, No. 4, pp. 2233–2243, 2014.[2] B. Chen, J. Wan, L. Shu, P. Li, M. Mukherjee and B. Yin, ”Smart Factoryof Industry 4.0: Key Technologies, Application Case, and Challenges,” inIEEE Access, Vol. 6, pp. 6505–6519, 2018.[3] X. Fang, S. Misra, G. Xue and D. Yang, ”Smart Grid — The New andImproved Power Grid: A Survey,” in IEEE Communications Surveys &Tutorials, Vol. 14, No. 4, pp. 944–980, 2012.[4] X. Tang, “Research on Smart Logistics Model Based on Internet of ThingsTechnology,” IEEE Access, Vol. 8, pp. 151150–151159, 2020.[5] K. Sha, W. Wei, T. A. Yang, Z. Wang, W. Shi, ”On security challenges andopen issues in Internet of Things,” Future Generation Computer Systems,Vol. 83, pp. 326–337, 2018.[6] I. Stellios, P. Kotzanikolaou, M. Psarakis, C. Alcaraz and J. Lopez,”A Survey of IoT-Enabled Cyberattacks: Assessing Attack Paths toCritical Infrastructures and Services,” in IEEE Communications Surveys& Tutorials, Vol. 20, No. 4, pp. 3453–3495, 2018.[7] N. Miloslavskaya, A. Tolstoy, ”Internet of Things: information securitychallenges and solutions,” Cluster Comput., Vol. 22, pp. 103—119, 2019.[8] H. Hindy et al., ”A Taxonomy of Network Threats and the Effect ofCurrent Datasets on Intrusion Detection Systems,” in IEEE Access, Vol.8, pp. 104650–104675, 2020.[9] W. Sun, J. Liu, Y. Yue, “AI-enhanced offloading in edge computing: Whenmachine learning meets industrial IoT,”
IEEE Network , Vol. 33, No. 5,pp.68-74, 2019.[10] M. Zolanvari, M. A. Teixeira, L. Gupta, K. M. Khan and R. Jain,”Machine Learning-Based Network Vulnerability Analysis of IndustrialInternet of Things,”
IEEE Internet of Things Journal , Vol. 6, No. 4, pp.6822–6834, 2019. [11] X. Ma, T. Yao, M. Hu, Y. Dong, W. Liu, F. Wang, J. Liu, “A surveyon deep learning empowered IoT applications,”
IEEE Access , Vol. 7,pp.181721-181732, 2019.[12] R. Chalapathy , S. Chawla, ”Deep Learning for Anomaly Detection: ASurvey”, 2019, https://arxiv.org/abs/1901.03407v2.[13] M.G. Sarwar Murshed, C. Murphy, D. Hou, N. Khan, G. Anantha-narayanan, and F. Hussain, ”Machine Learning at the Network Edge:A Survey”, 2020, https://arxiv.org/pdf/1908.00080.pdf[14] M. Muller, D., Behnke, P.B., Bok, M., Peuster, S. Schneider, H. Karl,“5G as Key Technology for Networked Factories: Application of Vertical-specific Network Services for Enabling Flexible Smart Manufacturing,”IEEE INDIN 2019, pp. 1495–1500, 2019.[15] Third Generation Partnership Project, Technical Report 45.820 v13.0.0,Cellular System Support for Ultra-Low Complexity and Low ThroughputInternet of Things, 2016.[16] A. Rico-Alvarino, M. Vajapeyam, H. Xu, X. Wang, Y. Blankenship, J.Bergman, T. Tirronen, E. Yavuz, “An overview of 3GPP enhancements onmachine to machine communications,” IEEE Communications Magazine,Vol. 54, No. 6, pp. 14–21, 2016.[17] E. Rastogi, N. Saxena, A. Roy, D.R. Shin, ”Narrowband Internet ofThings: A Comprehensive Study,” Computer Networks, vol. 173, 2020.[18] M.R. Palattella, M. Dohler, A. Grieco, G. Rizzo, J. Torsner, T. Engel,and L. Ladid, “Internet of things in the 5G era: Enablers, architecture, andbusiness models,” IEEE Journal on Selected Areas in Communications,Vol. 34, No. 3, pp.510–527, 2016.[19] A. Burg, A. Chattopadhyay, A. and K.Y. Lam, “Wireless communicationand security issues for cyber–physical systems and the Internet-of-Things,” Proceedings of the IEEE, Vol. 106, No. 1, pp.38–60, 2017.[20] X. Zhang, A. Kunz, S. Schr¨oder, “Overview of 5G security in 3GPP,”IEEE Conference on Standards for Communications and Networking(CSCN), pp. 181–186, 2017.[21] O. Liberg, M. Sundberg, E. Wang, J. Bergman, J. Sachs, “CellularInternet of things: technologies, standards, and performance,” AcademicPress, 2017.[22] Y.-P. Eric Wang, X. Lin, A. Adhikary, A. Grovlen, Y. Sui, Y. Blanken-ship, J. Bergman, H.S. Razaghi, ”A Primer on 3GPP Narrowband Internetof Things,” IEEE Comm. Magazine, vol. 55, no. 3, pp. 117–123, 2017.[23] F. Hussain, R. Hussain, S. A. Hassan and E. Hossain, ”Machine Learningin IoT Security: Current Solutions and Future Challenges,” in IEEEComm. Surveys & Tutorials, Vol. 22, No. 3, pp. 1686–1721, 2020.[24] F. Ullah, H. Naeem, S. Jabbar, S. Khalid, M.A. Latif, F. Al-Turjman,L. Mostarda, “Cyber security threats detection in internet of things usingdeep learning approach,”
IEEE Access , Vol. 7, pp. 124379-124389, 2017.[25] Z. Zhou, X. Chen, E. Li, L. Zeng, K. Luo and J. Zhang, ”EdgeIntelligence: Paving the Last Mile of Artificial Intelligence with EdgeComputing,”, Proc. of the IEEE, Vol. 107, No. 8, pp. 1738 – 1762, 2019.[26] Y. He, J. Lin, Z. Liu, H. Wang, L.-J. Li and S. Han, ”Amc: Automlfor model compression and acceleration on mobile devices,” EuropeanConference on Computer Vision, Springer, pp. 815-–832, 2018.[27] B. Zoph and Q. V. Le, ”Neural architecture search with reinforcementlearning,”, Proceedings of the International Conference on LearningRepresentations, Toulon, France, 2019[28] S. Y. Nikouei, Y. Chen, S. Song, R. Xu, B. Choi and T. Faughnan,”Smart Surveillance as an Edge Network Service: From Harr-Cascade,SVM to a Lightweight CNN,” Int’l Conference on Collaboration andInternet Computing (CIC), pp. 256-265, 2018.[29] Z. Tian, C. Luo, J. Qiu, X. Du, and M. Guizani, “A distributed deeplearning system for web attack detection on edge devices,” IEEE Trans.on Industrial Informatics, Vol. 16, No. 3, pp.1963–1971, 2019.[30] T. D. Nguyen, S. Marchal, M. Miettinen, H. Fereidooni, N. Asokan andA. Sadeghi, ”D¨IoT: A Federated Self-learning Anomaly Detection Systemfor IoT,” IEEE ICDCS 2019, pp. 756–767, 2019.[31] S. Wang, T. Tuor, T. Salonidis, K. K. Leung, C. Makaya, T. He, K. Chan,“Adaptive Federated Learning in Resource Constrained Edge ComputingSystems,”, IEEE Journal on Selected Areas in Communications, Vol. 37,No. 6, pp. 1205 – 1221, 2019.[32] A. Ferdowsi and W. Saad, ”Generative Adversarial Networks for Dis-tributed Intrusion Detection in the Internet of Things,” IEEE GLOBE-COM 2019, pp. 1–6, 2019.[33] Y. Meidan et al., ”N-BaIoT—Network-Based Detection of IoT BotnetAttacks Using Deep Autoencoders,” in IEEE Pervasive Computing, Vol.17, No. 3, pp. 12–22, 2018.[34] V. H. Bezerra, V. da Costa, S.B. Junior, R.S. Miani,B.B. Zarpel˜ao,”IoTDS: A One-Class Classification Approach to Detect Botnets inInternet of Things Devices,” Sensors, Vol. 19, No. 14, 2019. [35] S. Rathore, J. H. Park, ”Semi-supervised learning based distributedattack detection framework for IoT,” App. Soft Comp., Vol. 72, pp. 79–89,2018.[36] R. Doshi, N. Apthorpe and N. Feamster, ”Machine Learning DDoSDetection for Consumer Internet of Things Devices,” IEEE Security andPrivacy Workshops SPW 2018, pp. 29–35, 2018.[37] H. HaddadPajouh, A. Dehghantanha, R. Khayami, K.R. Choo, ”A deepRecurrent Neural Network based approach for Internet of Things malwarethreat hunting,” Future Generation Comp. Syst., Vol. 85, pp. 88–96, 2018.[38] J. Schneible and A. Lu, ”Anomaly detection on the edge,” IEEEMILCOM 2017, pp. 678–682, 2017.[39] K. Kamaraj, B. Dezfouli and Y. Liu, ”Edge Mining on IoT Devices Us-ing Anomaly Detection,” Asia-Pacific Signal and Information ProcessingAssociation Annual Summit and Conference, pp. 33–40, 2019.[40] B. Martinez, F. Adelantado, A. Bartoli and X. Vilajosana, ”Exploring thePerformance Boundaries of NB-IoT,” in IEEE Internet of Things Journal,6(3), pp. 5702-5712, 2019.[41] B. Vejlgaard, M. Lauridsen, H. Nguyen, I.Z. Kov´acs, P. Mogensen, M.Sorensen, “Coverage and capacity analysis of sigfox, lora, gprs, and nb-iot,”
IEEE VTC Spring , pp. 1-5, 2017.[42] R. Ratasuk, N. Mangalvedhe, D. Bhatoolaul, “Coexistence Analysis ofLTE eMTC and 5G New Radio,” IEEE PIMRC 2019.[43] A. Shamsoshoara, A. Korenda, F. Afghah, S. Zeadally, “A surveyon hardware-based security mechanisms for internet of things,” arXivpreprint arXiv:1907.12525, 2019.[44] D. P. Kingma and J. Ba, ”Adam: A Method for Stochastic Optimization”,arXiv:1412.6980, 2014[45] A. Jaddoa, G. Sakellari, E. Panaousis, G. Loukas, and P.G. Sarigiannidis,“Dynamic decision support for resource offloading in heterogeneousInternet of Things environments,”