[PDF] Enhancing network forensics with particle swarm and deep learning: The particle deep framework

Abstract

The popularity of IoT smart things is rising, due to the automation they provide and its effects on productivity. However, it has been proven that IoT devices are vulnerable to both well established and new IoT-specific attack vectors. In this paper, we propose the Particle Deep Framework, a new network forensic framework for IoT networks that utilised Particle Swarm Optimisation to tune the hyperparameters of a deep MLP model and improve its performance. The PDF is trained and validated using Bot-IoT dataset, a contemporary network-traffic dataset that combines normal IoT and non-IoT traffic, with well known botnet-related attacks. Through experimentation, we show that the performance of a deep MLP model is vastly improved, achieving an accuracy of 99.9% and false alarm rate of close to 0%.

Full PDF

EENHANCING NETWORK FORENSICS withPARTICLE SWARM and DEEP LEARNING: THEPARTICLE DEEP FRAMEWORK

Nickolaos Koroniotis and Nour Moustafa School of Engineering and Information Technology, University of New South Wales Canberra,Canberra, Australia [email protected]

Abstract.

The popularity of IoT smart things is rising, due to the automation theyprovide and its eﬀects on productivity. However, it has been proven that IoT devicesare vulnerable to both well established and new IoT-speciﬁc attack vectors. In thispaper, we propose the Particle Deep Framework, a new network forensic frameworkfor IoT networks that utilised Particle Swarm Optimisation to tune the hyperpa-rameters of a deep MLP model and improve its performance. The PDF is trainedand validated using Bot-IoT dataset, a contemporary network-traﬃc dataset thatcombines normal IoT and non-IoT traﬃc, with well known botnet-related attacks.Through experimentation, we show that the performance of a deep MLP model isvastly improved, achieving an accuracy of 99.9% and false alarm rate of close to 0%.

Keywords:

Network forensics, Particle swarm optimization, Deep Learning, IoT,Botnets

With more than 7 billion devices deployed in 2018 and double that number in2019, smart IoT things are becoming ever more popular, as they provide automatedservices that improve performance and productivity, while reducing operating costs[1]. Various applications of the IoT have been developed, such as the smart homecomprised of home appliances like smart lights and fridges, and on a larger scalethe Industrial IoT (IIoT) that spans areas such as industry, healthcare, agricultureand automation. The next step is considered to be the ”smart city”, where smartthings will be used to monitor and control powerplants, public transport, watersupply and more [1].However, it has been proven that IoT devices are vulnerable to both well estab-lished and new IoT-speciﬁc attack vectors. In a 2018 report by Symantec regardingthe security threats found in the Internet [2], it was reported that the total num-ber of attacks targeting IoT devices for 2018 exceeded 57,000, with more than5,000 attacks being recorded each month. Hackers have compromised vulnerable,unpatched or unencrypted IoT devices in order to steal sensitive data, corrupt thedevice’s normal operation, spread malware infections [3][4][2] or even compromisethe security of a smart home by disabling smart locks and garage doors [5]. a r X i v : . [ c s . CR ] M a y ue to the lack of common standards and heterogeneity displayed by the IoT[3], the development of an eﬃcient network forensic solution becomes diﬃcult [6], asthere may be multiple diverse protocols in use in a single deployment [7,8,9]. Typi-cally, the network forensic process is segmented into several distinct phases, wherebyeach phase deﬁnes the necessary preparation, analysis and actions of investigation[10], with these phases being identiﬁcation, collection, preservation, examination,analysis and presentation. The ﬁrst three phases deﬁne access to the crime scene,detection of potential sources of evidence, secure collection of digital artifacts andtraces and their preservation. Examination and analysis deﬁne the actions neces-sary to identify useful evidence in the collected data, and then inferences are madeabout the crime, while the presentation stage prepares the identiﬁed informationto be presented in court of law.Many forensic frameworks that rely on the public ledger scheme have been de-veloped, although they focus on the acquisition stage rather than the entire inves-tigation phase [8] [11] [12] [13]. By relying on a public ledger scheme, the integrityof evidence would be guarantied, while experts would have immediate access toit. These frameworks however, do not cover the examination and analysis stages,and may introduce some disadvantages to the forensic process such as privacy con-cerns, as the user’s data is distributed between stakeholders while adding excesscomplexityDuring a forensic investigation, some aspect of a computer system is examinedto identify traces. One source that is preferred for IoT investigations, is traﬃccollection, with two main approaches: deep packet inspection and network ﬂowanalysis. Deep packet inspection focuses on the payload allowing for an in-depthanalysis. Network ﬂow analysis summarizes network traﬃc, by ignoring the payloadand utilising statistical data like connection speed and exchanged bits. For therequirements of the research presented in this paper, network ﬂow is preferred, aswe suggest in this work [3].Since IoT devices are designed to be continuously active, collecting traﬃc fromsuch a network would result in excessive volumes of data. As such, to analyse thecollected data, fast and automated methods are employed, with one prominentexample being deep learning that is ideal for rapidly scanning large volumes ofnetwork data to detect patterns that indicate and attack [14,15,16,17]. However, inorder to utilise a deep learning model, it ﬁrst needs to be trained, which involvesselecting hyperparameters, the values of which can greatly aﬀect its performance[18,19]. Thus, researchers have sought to develop ways of selecting optimal valuesfor a model’s hyperparameters [20,18,21,22], with an emphasis given towards moreautomated processes. Regardless, no single optimization method has been acceptedas the preferred method by the research community and as such, the research isongoing. It is a necessity to develop network forensic-based optimisation to timelyinvestigate security incidents in IoT networks [23,3,24] .he main contributions of this paper are as follows: – We propose a new network forensic framework, named Particle Deep Framework(PDF), based on optimisation and deep learning. – We use an optimization method based on Particle Swarm Optimization(PSO)to select the hyperparameters of the Deep Neural Network (DNN).The structure of the paper is as follows. Section 2 includes background andrelated research in the application of particle swarm optimisation and deep neuralnetworks to network forensics. Section 3 presents our proposed Particle Deep Frame-work in detail. Section 4 presents and discusses the experimental results acquiredby utilising our PDF. In Section 5 we discuss the advantages and disadvantages ofthe PDF. Finally, Section 6 includes the concluding remarks of this paper.

Due to the malicious actions of hackers who take advantage of vulnerabilities presentin popular technologies, the ﬁeld of digital forensics emerged. Through digital foren-sics, experts examine a crime scene, gather data that are then processed and anal-ysed in order to identify a hacker’s methods and target, a process that can leadto prosecution [25]. Since it was ﬁrst introduced, digital forensics has been sepa-rated into specialized sub-disciplines, each ideal for diﬀerent areas, such as: cloudforensics, IoT forensics, network forensics and data forensics [26].Network forensics, which is the subdiscipline that we utilise in this paper, exam-ines security network-related security incidents, with collected data spanning fromlogs to packet captured. Practitioners of network forensics often employ automatedsoftware and hardware tools for the collection and preservation of data, however,the process of performing a forensic examination is not rigidly deﬁned. This hasresulted in the emergence of various digital forensic frameworks, which determinethe correct course of action during an investigation, separating the process intoautonomous stages and suggesting appropriate tools and techniques for each task.Even though many forensic frameworks have been proposed [12,27,11], existing so-lutions for smart homes give emphasis on acquisition and neglect examination andanalysis. Furthermore, no single framework has been acknowledged as superior, dueto lack of standardization in IoT and heterogeneity of computer systems [28,29,3].Various researches have proposed forensic frameworks for IoT environments [8][11] [12] [13]. Hossain et al citehossain2018probe proposed Probe-IoT, an acqui-sition framework that establishes chain of custody, while ensuring that privacy ismaintained. An acquisition model for smart vehicles named Block4Forensic was pro-posed by Cebe et al. [13]. Both Probe-IoT and Block4Forensic utilise a blockchain tonsure the integrity of the collected data, which includes diagnostic and interactiondata acquired from IoT devices.Most of the proposed IoT frameworks were constructed by using distributedblockchains [27,11,12]. Hossain et al. [12] proposed Probe-IoT and [11] FIF-IoT,Le et al. [27] developed BIFF. These frameworks utilise a distributed blockchain,managed by several entities like the manufacturers, the police and insurance com-panies, with pre-approved roles and digital signatures ensuring conﬁdentiality andnon-repudiation. For the issuing of digital keys necessary for the digital signatures,a certiﬁcation authority needs to be established, that will also manage the public-keys and avoid main-in-the-middle attacks [30].While these acquisition frameworks may improve an investigation, by providingeasy access to collected data, they have some drawbacks. To begin with, for theframeworks to function eﬀectively, law enforcement need to invest in resources tostore and manage the collected data, multiple independent organisations need tocollaborate seamlessly while owners of IoT devices need to agree and trust theorganisations that may access their data [12,27,11]. In addition, owners may faceextra charges due to the introduction of dedicated devices for data collection, ordeterioration of their device’s performance and increases in power consumption dueto transmissions to services that incorporate the collected data to the blockchain.

Commonly, network forensic applications have incorporated a number of techniquesbased on mathematics and machine learning, such as fuzzy logic, na¨ıve bayes classi-ﬁers, support vector machines and neural networks [31,32,33,34]. However, contem-porary research has proposed deep learning as an alternative as, long training timesnotwithstanding, deep models tend to outperform other solutions when tasked withprocessing large volumes of data [35,36,3,14,17].Consisting of both discriminative and generative models, deep learning is asubgroup of neural networks, that is designed to incorporate multiple hidden layersand neurons in what is known as a ”deep architecture” [37,14]. By stacking multiple(in the thousands) hidden layers, a deep learning model is capable of detecting morecomplex patterns, along with their variations, than simpler and shallower neuralnetworks [14,18].As an application of network forensics, deep learning has been employed forattack detection in network traces, with multiple examples in research. Shone etal. [14] designed an intrusion detection system (IDS) comprised of a pre-trained,non-symmetric deep autoencoder stacked with a random forest. The IDS, trainedon the KDD dataset achieved an 89.22% accuracy. For the purpose of malwaredetection, Azmoodeh et al. [36] developed a deep convolutional neural network,that was trained on eigenvectors obtained from smartphone application code. Themodel achieved 98% precision and accuracy.run et al. [38] investigated the detection of denial of service and sleep attackstargeting IoT gateways, and proposed a detection method based on a random neuralnetwork model although, its performance was similar to a threshold detector. A deeplearning model based on MLP was proposed by Pektas et al. [39] that focused ondetecting botnets by ﬂagging C&C traﬃc. Results indicated that the performanceof deep learning models, when tasked with identifying botnets in network ﬂows,was acceptable.One of the ﬁrst steps in utilising Convolutional Neural Networks (CNNs) fordeep packet inspection, named D2PI, was proposed by Cheng et al. [40]. Thissystem was trained on extracted traﬃc payload data, with the CNN receiving aﬁxed input. Results reported by the researchers were promising. Alrashdi et al. [41]developed AD-IoT, an anomaly NIDS designed to identify infected IoT devices.The NIDS utilised a random forest and extra tree classiﬁers and was evaluated onthe UNSW-NB15 dataset, with results being promising.Previous work on forensic frameworks for the IoT has either focused on theacquisition aspect of a forensic investigation [12, ? , ? ], or requires alterations to thesmartphone applications associated with a smart device [42]. As such, the frame-work proposed in this research, the PDF, is an important and practical additionto the network forensic research literature, as it incorporates both the examinationand analysis phases by using deep learning and network ﬂow data, without the needfor alterations to IoT system architectures. In order to tune the hyperparameter values of a deep MLP model, in the context ofthis paper, we employed Particle Swarm Optimization (PSO). PSO is a metaheuris-tic swarm-based optimisation algorithm developed by Eberheart and Kennedy in1995 [43]. At the start of PSO, a swarm of particles is randomly generated, andtasked with exploring a variable search space, with each position being a new valuefor the variable that is being optimised. During each iteration, a particle’s velocityand local position, as-well-as the global best position are updated, based on itsprevious position, its best-detected position (value) and the swarm’s best position[44,45]. The quality of a particle’s location is determined by an objective function.The original PSO algorithm proposed by Kennedy et al. [43], which is explainedby equations 3-6, was followed by a number of variants, each designed for diﬀer-ent scenarios. For example, Kennedy et al. proposed in their original work, thatthe learning rates θ and θ in equation 5 greatly aﬀect the search pattern of theswarm, with increases in θ prioritizing local search while increases in θ spreadingthe swarm. The next variant, called standard PSO, was proposed by Shi et al. [46]and integrated an inertia coeﬃcient to equation 5 and speciﬁcally, multiplied it tothe previous velocity value. As this inertia ( ω ) has a direct eﬀect on a particle’strajectory, various initialization strategies have been proposed, such as setting ito a positive, ﬁxed value [46], random initialization [47] and using a function thatdeclines with time citenikabadi2008particle , with a linear example given in Equa-tion 1. In Equation 1, ω max and ω min are user-deﬁned maximum and minimumweights, while i and i m ax indicate the the current and total number of iterationsof the swarm respectively. The justiﬁcation for using a decaying inertia is to forceparticles to spread-search at early intervals of the PSO, and then gravitate towardsidentiﬁed optima. ω t = ω max − ii max ∗ ω max − ω min (1)Because experiments demonstrated the tendency of original and standard PSOto cause the velocity of particles to explode, improvements were suggested, in theform of velocity clamping and constriction factors [48,49,50]. Velocity clampingbinds a particle’s velocity to a pre-determined upper bound, while constrictionfactors alter Equation 5, by multiplying the new velocity with a constriction factor”K” given by Equation 2. K = 2 / | − φ − (cid:112) φ − φ | , whereφ = θ + θ andφ > . (2)Other variants for the PSO algorithm were designed, to allow its application tonew, previously not supported problems [51]. Some prominent examples include abinary version of PSO [51], where the restricted velocity of a particle was calculatedand fed as input to a sigmoid function that produced a binary value (either ’0’ or’1’). The cooperative PSO [52] was proposed for multi-dimensional problems, witha new swarm spawned for each dimension. Finally, the fully informed PSO [53]alters Equation 5 and uses the best position of neighbours to calculate a particle’snew velocity. In this section, we introduce the Particle Deep Framework (PDF), a multi-stagednovel network forensic framework for detecting and analysing attacks and their ori-gins in IoT networks, that combines deep learning and particle swarm optimisationmethods, as shown in Figure 1.Next, the ﬁve stages depicted in in Figure 1 of the new PDF forensic frameworkare expanded as follows. – Stage 1: Network capturing Tools:

In this stage, IoT devices capableof accessing all local network traﬃc by being set in promiscuous mode, areattached to a network that is under investigation. Specialised packet capturingtools, such as Wireshark [54], Tcpdump [55] and Ettercap [56] are then employedto collect network packets. As an example, for the purposes of developing the ig. 1.

Proposed network forensic framework using particle swarm optimization and Multi-layerPerceptron (MLP) deep learning algorithms.

Bot-IoT dataset [57], we employed T-shark [58], a terminal-based alternative toWireshark. Through the ”-i” command we speciﬁed the Network Interface Card(NIC) to be used, which was set in promiscuous mode, allowing the collectionof all the generated packets in the local virtual network. The collected pcap ﬁlesare then utilised in the following data collection stage. – Stage 2: Data Collection and Management Tools:

In this stage, data collection takes place, producing results like the Bot-IoT andUNSW-NB15 datasets. To ensure the preservation of the collected data, the di-gests of the pcap ﬁles are generated by using an SHA-256 hashing function [59].Network ﬂows are then generated from the collected pcaps, by using networkﬂow extraction tools like Argus [60] or Bro [61]. Further preprocessing actionslike feature normalization, elimination and extraction, improve the training pro-cess of machine learning models at later stages. In the next stage, the produceddata is utilised by deep learning and particle swarm optimisation, to identify,trace and analyse cyber-attacks. – Stage 3: Particle Swarm Optimization (PSO) for adapting hyperpa-rameters of Deep Learning model:

In this stage, the hyperparameters of deep learning models are tuned by anoptimisation algorithm, with the PSO [45] chosen for the task, due to its con-vergence speed compared to other evolutionary algorithms [44,62]. In this study,the PSO was used to identify hyperparameters that periodically maximise theArea Under Curve (AUC) of a deep Multi-Layer Perceptron (MLP) model. Theidentiﬁed hyperparameters are then used to train the ﬁnal version of the MLPmodel.

Stage 4: MLP deep learning for attack identiﬁcation:

In this stage, the preprocessed data from Stage 2 and the tuned hyperparam-eters of Stage 3 are utilised to train and evaluate a deep MLP model. For thisstudy, the deep MLP’s architecture was comprised of seven layers and numberof neurons as follows: 20, 40, 60, 80, 40, 10, 1. From stage 3, the number ofepochs, learning rate and batch size are optimised and used during the train-ing of the deep model, with the data from Stage 2 split into 80% and 20% fortraining and testing respectively. – Stage 5: Performance measure:

In the ﬁnal stage, the performance of the ﬁnalised deep MLP model is gauged byparsing the testing set and obtaining the following metrics: accuracy, precision,recall, false positive and negative rates and F-measure. In the following twosubsections, Stages 3-5 are discussed in more detail.

Originally inspired by observing the movement of animal swarms in their naturalhabitat, Particle Swarm Optimisation (PSO) is a metaheuristic evolutionary algo-rithm, that spawns a pre-determined number of particles ( P ) that are randomlyinitialized and set to traverse a variable’s search space ( v ).During a particle’s propagation through the search space, each new position( v t +1 ) is evaluated by the output of an objective function, which may change basedon the problem being optimised. A particle is deﬁned by four values (Equation 4),its velocity ( v t ) current position ( x t ), its local best ( x lbest ) and the swarm’s bestposition ( x gbest ), as given in Equations 3,5,6. P = p , p , . . . , p n , n ∈ N (3) ∀ p n ∈ P, p n = ( x t , v t , x lbest , x gbest ) (4) v t +1 = v t + θ ∗ rand ∗ ( x lbest − x t ) + θ ∗ rand ∗ ( x gbest − x t ) , rand ∈ [0 ,

1] (5) x t +1 = x t + v t +1 (6)In Equation 5, v t +1 a particle’s new velocity is determined by its previous veloc-ity v t , a random (rand) proportion of the learning rates θ , θ and the diﬀerences(distance) of its current position to its local and the swarm’s best positions. Equa-tion 6 gives the updated position of a particle x t +1 , determined by its previousposition x t and its current velocity. Next, Algorithm 1 depicts an iteration of thePSO algorithm used to maximise the AUC of a deep model [45].The PSO algorithm depicted in Algorithm 1, is set to maximise the AUC of adeep MLP model, in order to determine the optimal values for the three hyperpa-rameters: learning rate, epochs and batch size, by separately traversing their searchspaces. lgorithm 1: Particle Swarm Optimization maximization algorithm P ← construct particles(n particles); ∀ p ∈ P, p.X lbest = p.x , p.X gbest = −∞ ;epochs ← load epochs();e ← while e < epochs doforeach p ∈ P do v t +1 = v t + θ ∗ rand ∗ ( x lbest − x t ) + θ ∗ rand ∗ ( x gbest − x t ) , rand ∈ [0 , x t +1 = x t + v t +1 ; if x t +1 >x lbest then x lbest = x t +1 ; endif x t +1 >x gbest then x gbest = x t +1 ; endendendreturn P.global best()

There are a number of reasons that support the selection of PSO algorithm in-place of another evolutionary metaheuristic algorithm for hyperparameter tuning.To begin with, it has been established that PSO often converges faster than alterna-tives [63] and can produce acceptable results in realistic time [64]. Furthermore, itsalgorithm is readily understood and implemented [65]. Finally, this work providesempirical information about the performance of PSO for hyperparameter tuning,in the context of deep learning and network forensics, as our research indicates thisis the ﬁrst time it has been applied to the task.

The novel PDF is a considerable inclusion in the discipline of network forensics,overlapping with the stages of network forensics, collection, preservation, examina-tion and analysis and presentation, as depicted in Figure 2. The proposed frame-work takes advantage of a deep neural network’s multiple layers, which enhance themodel’s performance while maintaining the execution time within reason. ig. 2.

Stages of investigation in network forensics including the proposed framework

Algorithm 2:

Particle deep model for hyperparameter estimation of deeplearning

Data: nn ← load neural network structure() ;[b,e,lr] ← initialize random hyperparameters() ;hyperparameters ← [b,e,lr] ;PS ← construct particle swarm(n particles,swarm epochs) ;i ← foreach h ∈ hyperparameters dowhile P S.swarm epochs (cid:54) = 0 do h ← P S.maximize ( nn.AUC, h ) usingalgorithm end nn.save opt hyperparam (( h )) ; end nn.train NN ( training set ()); In Algorithm 2 we depict a Particle Deep Framework (PDF) iteration. Firstthe neural network is loaded with its pre-selected layers and number of neurons.Initially, the three hyperparameters batch size, number of epochs and learning rate[b,e,lr] are randomly initialized. Next a particle swarm comprised of a pre-selectednumber of particles ( n particles ) and number of iterations ( swarm epochs ) is gen-erated. Then algorithm 1 is utilized to identify the value of the hyperparametersthat are being optimized and maximize the AUC value of the neural network( h ← P S.maximize ( nn.AU C, h )). The process is repeated for every hyperpa-rameter that is being optimized, the identiﬁed values of which are utilized to trainthe ﬁnal neural network.To validate the optimised deep MLP model, we selected the Bot-IoT [57], as itis a contemporary dataset that combines IoT and non-IoT traces and attacks. Thedataset was partitioned 80% and 20% sets for training and respectively. The featuresf both training and testing sets were normalized with min-max in a range between[0,1]. As there exists no standard procedure for selecting optimal hyperparameters,at ﬁrst, the values were manually selected and the deep MLP model trained, afterwhich we employed the PDF as given by Algorithm 2.The performance of a neural network can be greatly aﬀected by the hyper-parameters. In this study, we focus on optimising three, learning rate, batch sizeand epochs. The learning rate is a decimal between ’0’ and ’1’, that regulates therate of change of the weights during training. The batch size determines how manyrecords are processed before the neural network’s weights are updated, and thenumber of epochs is the times that the entire dataset is processed by the networkduring training. In the PDF, as is evident in Algorithm 2, each hyperparameteris optimised separately, in order to reduce the search space that each PSO wouldneed to traverse. Otherwise, the search space would be equal to Batch size Size *Epochs Size * Learning rate Size .Because the deep model performed binary classiﬁcation between normal andattack traﬃc, the logistic cost function was selected [66], with its equation givenbelow. In addition, due to class imbalances in the Bot-IoT dataset, weights wereintroduced in order to counter any detrimental eﬀects they might have had to theclassiﬁer’s performances. C = − m m (cid:88) i =1 ( y i log( ˆ y i ) + (1 − y i ) log(1 − ˆ y i )) (7)With m being the number of records in each batch, y i being the real value andˆ y i the estimated value of the class feature of the i th record.Introducing the new weights to counter the imbalanced data, alter the logisticcost equation like so: w y i log ( ˆ y i ) + w (1 − y i ) log (1 − ˆ y i ), where w is the weightfor normal and w for attack records. During a PSO iteration, a particle reaches anew location and trains a version of the MLP, using its new location as a hyper-parameter, preserving values that improve the model’s performance. The processof training and testing a deep MLP model using the PDF is given in AlgorithmAlgorithm 3. As the AUC value is used to determine the wellness of a hyperparam-eter, requiring the MLP to be trained beforehand, the PDF execution time maybe excessive, with recorded times for our experiments and for each hyperparameterbeing: 4 hours for batch, 3 hours for epochs and 4 for learning rate optimisation. Inaddition, 7 minutes were required to train the ﬁnal MLP model, while its through-put was 14,762 records/second. The complexity of the proposed PDF used to adjustone hyperparameter and for each iteration is equal to O ( n p ∗ ( mlp + 1), with n p denoting the number of particles spawned by the PSO and mlp the time complexityfor training and testing an MLP model. lgorithm 3: Steps for training and testing the proposed particle deep model

Data: S = 0 : (cid:48) batch (cid:48) , (cid:48) epochs (cid:48) , (cid:48) learning r ate (cid:48) ; Results = (cid:48) batch (cid:48) : − , (cid:48) epochs (cid:48) : − , (cid:48) learning r ate (cid:48) : − NN = loadNN s tructure () Hyperparameters that aren (cid:48) t trained ; n p = 6 number of particles ; n e = 4 number of epochs ; Results = randomInitialState (); task = ‘ maximize AUC (cid:48) ; for k = 0; k < = 2; k + + do Runs once for each hyperparameter to optimize ; particles = generateP articles ( n p, n e ); bestHyper = runP SO ( particles, task, S [ k ] , NN ); Results [ S [ k ]] = bestHyper ; end trains a model with the identified hyperparameters ; trainedNN = trainNN ( NN, Results ); testNN ( trainedNN ); The contemporary Bot-IoT dataset [57] was selected for training and testing theproposed PDF. The Bot-IoT combines IoT and non-IoT traﬃc, representing a smarthome deployment, with the former generated by using Node-Red [67] and the en-tire dataset reaching 72.000.000 records and a total of 16.7 GB for its csv format.Speciﬁcally, we employed the ”10-best feature” version of the Bot-IoT dataset,from which we utilised 2.934.817 records for training (80%) and 733.705 recordsfor testing (20%). For evaluation purposes, we selected six metrics, accuracy, pre-cision, recall, FPR, FNR, F-measure. Accuracy represents the fraction of correctlyclassiﬁed records (

T P + T N ) / ( T P + T N + F P + F N ), precision is the fraction of pre-dicted as ”positive” records which were correctly identiﬁed

T P/ ( T P + F P ). Recallis the fraction of records correctly identiﬁed as ”positive” from all positive records

T P/ ( T P + F N ), the FPR and FNR are the fractions fo records incorrectly classi-ﬁed as ”positive” (

F P/ ( F P + T N )) or ”negative” (

F N/ ( F N + T P )) respectively.Finally, the F-measure is a measure of a model’s accuracy, produced by calculatingthe harmonic mean of the precision and recall values 2

T P/ (2 T P + F P + F N ).The three main attack categories represented in the dataset are Denial of Service,information theft and information gathering attacks, with each category furtherspecialized into subtypes as explained here [57].Experiments were performed on a laptop equipped with 16 GB RAM, Intel Corei7-6700HQ CPU @2.6GHz, and the programming language used for designing andtraining the deep MLP model and utilising PSO for hyperparameter optimisationas python. The packages that were used in this python environment were asfollows: for data pre-processing Numpy and Pandas, for implementing the deepMLP TensorFlow accessed through Keras and for deﬁning and running the PSOhyperparameter optimisation, Optunity[68].

This subsection describes the results obtained through experimentation and testingthe Particle Deep Framework (PDF) by employing the evaluation metrics describedin the previous subsection. The deep neural network architecture that was chosen isthat of the deep MLP, as MLP networks have been shown to be simple, powerful andﬂexible models, having the ability to model non-linear relations in data [69,70]. Thehyperparameters and architectures of the neural networks that were evaluated aregiven in Table 1, with the activation function for the hidden layers and the outputlayer being ”relu” and ”sigmoid” respectively. For the MLP’s weight initializationwe used the glorot uniform .In addition to an optimised MLP, a feature-compressing method was tested, inorder to investigate its eﬀects on the deep model’s performance. At ﬁrst, the fea-tures are converted by using the Normal distribution’s probability density function,then for each record, weights are applied, which are obtained by calculating the av-erage of the correlation coeﬃcient matrix of the features, ﬁnally combining theresulting values by adding and normalising them (min-max). As was already men-tioned, we applied weights to compensate for the class imbalances of the Bot-IoTdataset. Speciﬁcally, we used ”1” for attack records and ”4500” for normal records.Furthermore, we speciﬁed ﬁxed values for the random seeds used by the pythonimplementation of MLP and PSO (Keras, TensorFLow and Optunity) to make theresults reproducible. The results of our experiments, including the application ofthe PDF, are displayed in Table 1.The three neural networks depicted in Table 1 are further discussed here. First,(i) is an unoptimized neural network, trained on randomly chosen hyperparame-ters. Its high accuracy value is misleading, as its false-positive rate is just under89% which is achieved due to class imbalances. Second, (ii) is an MLP, that wastrained by using the PDF and data that has been compressed by combining thefeatures into one, as was previously described in this subsection. The justiﬁcationfor investigating a compression method for the features was to reduce the trainingtime. Although the model’s accuracy was slightly reduced to 95%, compared to theunoptimised version of this MLP, its false-positive rate was considerably reduced to8% with its false-negative rate increased to 5% respectively. Finally, (iii) is an MLPwith optimised hyperparameters, that was trained and tested by using the PDF,on the original 13-feature Bot-IoT dataset. This MLP outperformed the other twoversions, achieving false positive and negative rates close to ’0’, while maintainingan accuracy of 99.9% and a precision of ’1’. number of reasons can clarify why the 13-feature MLP (iii) outperformed thecompressed, single-feature MLP (ii). The single-feature MLP was trained on datathat compressed the input features, which may have resulted in considerable infor-mation loss. Furthermore, the 13-feature MLP (iii) incorporates 240 more trainableweights than the single-feature MLP (ii) between the input and ﬁrst hidden layers,allowing (iii) to detect even more complex patterns in the data than (ii).

Table 1.

Neural Networks that were trained.(i) UnoptimizedNN (ii) Optimized NNwith compressed in-put (iii) Optimized NNwith 13-features in-putNeurons perlayer 13, 20, 40, 60, 80,40, 10, 1 1, 20, 40, 60, 80, 40,10, 1 13, 20, 40, 60, 80,40, 10, 1Epochs 2 12 12Batch size 350 3064 732Learning rate 0.2 0.0015 0.0015Accuracy 0.999 0.947 0.999Precision 0.999 0.999 1Recall 0.999 0.947 0.999FPR 0.884 0.081 0FNR 9.269*10 − − F-measure 0.999 0.973 0.999 ig. 3. (i) Unoptimized NN

Fig. 4. (ii) Optimized NN with compressedinput

Fig. 5. (iii) Optimized NN with 13-features input

One advantage of the PDF is that it automates the hyperparameter tuning process,which originally was manual. Furthermore, the collection stage is carried out byreliable software that has seen wide use in industry. Additionally, the preservationstage is addressed, by using a cryptographic hashing algorithm, thus providing amechanism to validate the integrity of data, the lack of which can cause a forensicinvestigation to be dismissed.A limitation of the PDF, is that it requires considerable time to train a deepMLP model. This occurs because each particle trains an MLP on diﬀerent hyper-parameters, thus the size of the dataset, the number of layers and neurons of theLP and the number of particles used in the swarm will increase time require-ments. Furthermore, the PDF processes network ﬂow data, thus any informationin the body of collected packets is overlooked. In addition, countering spooﬁng at-tacks, that alter the IP address of the attacker, is challenging and may hinder aninvestigation.

Due to the swift adoption of IoT systems by industry and the general public, attackstargeting IoT networks have been increasing. This paper proposes a novel networkforensics framework Called Particle Deep Framework (PDF) for the detection andanalysis of cyber-attacks in IoT networks. First, the components of the PDF and itscorrespondence to the forensic stages were explained. In its core, the PDF combinedDeep Learning as the base model and Particle Swarm Optimisation for tuningits hyperparameters, with the contemporary Bot-IoT dataset used to validate itsperformance. The PDF achieved very high attack detection accuracy, at 99.9%, withfalse positive and negative rates approaching zero, while its classiﬁcation speed wasmeasured at 14,762 ﬂows per second. As future work, we intend to expand the PSO’sfunctionality, by adjusting it to process multiple hyperparameters, as-well-as testits eﬀectiveness against other IoT deployments, like smart health networks.

Acknowledgments

Nickolaos Koroniotis would like to thank the Commonwealth’s support, which isprovided to the aforementioned researcher in the form of an Australian GovernmentResearch Training Program Scholarship.

References

1. S. Cook, “60+ iot statistics and facts,”

Comparitech

IEEE Access , vol. 7, pp.61 764–61 785, 2019.4. B. Ali and A. Awad, “Cyber and physical security vulnerability assessment for iot-based smarthomes,”

Sensors , vol. 18, no. 3, p. 817, 2018.5. C. Robberts and J. Toft, “Finding vulnerabilities in iot devices: Ethical hacking of electroniclocks,” 2019.6. M. Conti, A. Dehghantanha, K. Franke, and S. Watson, “Internet of things security andforensics: Challenges and opportunities,”

Future Generation Computer Systems , vol. 78, pp.544–546, 2018.. E. Ronen, A. Shamir, A.-O. Weingarten, and C. O’Flynn, “Iot goes nuclear: Creating a zigbeechain reaction,” in . IEEE, 2017, pp.195–212.8. C. Meﬀert, D. Clark, I. Baggili, and F. Breitinger, “Forensic state acquisition from internetof things (fsaiot): A general framework and practical approach for iot forensics through iotdevice state acquisition,” in

Proceedings of the 12th International Conference on Availability,Reliability and Security . ACM, 2017, p. 56.9. S. Al-Sarawi, M. Anbar, K. Alieyan, and M. Alzubaidi, “Internet of things (iot) communicationprotocols,” in . IEEE,2017, pp. 685–690.10. R. Kaur and A. Kaur, “Digital forensics,”

International Journal of Computer Applications ,vol. 50, no. 5, 2012.11. M. Hossain, Y. Karim, and R. Hasan, “Fif-iot: A forensic investigation framework for iot usinga public digital ledger,” in .IEEE, 2018, pp. 33–40.12. M. M. Hossain, R. Hasan, and S. Zawoad, “Probe-iot: A public digital ledger based forensicinvestigation framework for iot.” in

INFOCOM Workshops , 2018, pp. 1–2.13. M. Cebe, E. Erdin, K. Akkaya, H. Aksu, and S. Uluagac, “Block4forensic: An integratedlightweight blockchain framework for forensics applications of connected vehicles,”

IEEE Com-munications Magazine , vol. 56, no. 10, pp. 50–57, OCTOBER 2018.14. N. Shone, T. N. Ngoc, V. D. Phai, and Q. Shi, “A deep learning approach to network intrusiondetection,”

IEEE Transactions on Emerging Topics in Computational Intelligence , vol. 2, no. 1,pp. 41–50, 2018.15. S. Prabakaran and S. Mitra, “Survey of analysis of crime detection techniques using datamining and machine learning,” in

Journal of Physics: Conference Series , vol. 1000, no. 1.IOP Publishing, 2018, p. 012046.16. Z. Wang, “The applications of deep learning on traﬃc identiﬁcation,”

BlackHat USA , vol. 24,2015.17. G. Zhao, C. Zhang, and L. Zheng, “Intrusion detection using deep belief network and proba-bilistic neural network,” in , vol. 1. IEEE, 2017, pp. 639–642.18. A. Zela, A. Klein, S. Falkner, and F. Hutter, “Towards automated deep learning: Eﬃcientjoint neural architecture and hyperparameter search,” in

ICML 2018 AutoML Workshop , Jul.2018.19. B. Wang and N. Z. Gong, “Stealing hyperparameters in machine learning,” in . IEEE, 2018, pp. 36–52.20. T. Chen, L. Zheng, E. Yan, Z. Jiang, T. Moreau, L. Ceze, C. Guestrin, and A. Krishnamurthy,“Learning to optimize tensor programs,” in

Advances in Neural Information Processing Sys-tems , 2018, pp. 3389–3400.21. J. Wang, J. Xu, and X. Wang, “Combination of hyperband and bayesian optimization forhyperparameter optimization in deep learning,” arXiv preprint arXiv:1801.01596 , 2018.22. D. Stamoulis, E. Cai, D.-C. Juan, and D. Marculescu, “Hyperpower: Power-and memory-constrained hyper-parameter optimization for neural networks,” in . IEEE, 2018, pp. 19–24.23. S. Watson and A. Dehghantanha, “Digital forensics: the missing piece of the internet of thingspromise,”

Computer Fraud & Security , vol. 2016, no. 6, pp. 5–8, 2016.24. M. Chernyshev, S. Zeadally, Z. Baig, and A. Woodward, “Internet of things forensics: Theneed, process models, and open issues,”

IT Professional , vol. 20, no. 3, pp. 40–49, 2018.25. G. L. Palmer, “A road map for digital forensics research-report from the ﬁrst digital forensicsresearch workshop (dfrws)(technical report dtr-t001-01 ﬁnal),”

Air Force Research Laboratory,Rome Research Site, Utica , pp. 1–48, 2001.6. D. P. Joseph and J. Norman, “An analysis of digital forensics in cyber security,” in

FirstInternational Conference on Artiﬁcial Intelligence and Cognitive Computing . Springer, 2019,pp. 701–708.27. D.-P. Le, H. Meng, L. Su, S. L. Yeo, and V. Thing, “Biﬀ: A blockchain-based iot forensicsframework with identity privacy,” in

TENCON 2018-2018 IEEE Region 10 Conference . IEEE,2018, pp. 2372–2377.28. A. Valjarevic and H. S. Venter, “A comprehensive and harmonized digital forensic investigationprocess model,”

Journal of forensic sciences , vol. 60, no. 6, pp. 1467–1483, 2015.29. L. Caviglione, S. Wendzel, and W. Mazurczyk, “The future of digital forensics: Challenges andthe road ahead,”

IEEE Security & Privacy , vol. 15, no. 6, pp. 12–17, 2017.30. S.-W. Han, H. Kwon, C. Hahn, D. Koo, and J. Hur, “A survey on mitm and its countermeasuresin the tls handshake protocol,” in . IEEE, 2016, pp. 724–729.31. N. Liao, S. Tian, and T. Wang, “Network forensics based on fuzzy logic and expert system,”

Computer Communications , vol. 32, no. 17, pp. 1881–1892, 2009.32. A. A. Ahmed and M. F. Mohammed, “Sairf: A similarity approach for attack intention recog-nition using fuzzy min-max neural network,”

Journal of Computational Science , vol. 25, pp.467–473, 2018.33. A. Yudhana, I. Riadi, and F. Ridho, “Ddos classiﬁcation using neural network and na¨ıve bayesmethods for network forensics,”

INTERNATIONAL JOURNAL OF ADVANCED COM-PUTER SCIENCE AND APPLICATIONS , vol. 9, no. 11, pp. 177–183, 2018.34. K. Nguyen, D. Tran, W. Ma, and D. Sharma, “An approach to detect network attacks appliedfor network forensics,” in . IEEE, 2014, pp. 655–660.35. K. Alrawashdeh and C. Purdy, “Toward an online anomaly intrusion detection system basedon deep learning,” in . IEEE, 2016, pp. 195–200.36. A. Azmoodeh, A. Dehghantanha, and K.-K. R. Choo, “Robust malware detection for internetof (battleﬁeld) things devices using deep eigenspace learning,”

IEEE Transactions on Sustain-able Computing , vol. 4, no. 1, pp. 88–95, 2018.37. Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature , vol. 521, no. 7553, p. 436, 2015.38. O. Brun, Y. Yin, and E. Gelenbe, “Deep learning with dense random neural network fordetecting attacks against iot-connected home environments,”

Procedia computer science , vol.134, pp. 458–463, 2018.39. A. Pekta¸s and T. Acarman, “Botnet detection based on network ﬂow summary and deeplearning,”

International Journal of Network Management , p. e2039, 2018.40. R. Cheng and G. Watson, “D2pi: Identifying malware through deep packet inspection withdeep learning,” 2018.41. I. Alrashdi, A. Alqazzaz, E. Alouﬁ, R. Alharthi, M. Zohdy, and H. Ming, “Ad-iot: anomalydetection of iot cyberattacks in smart city using machine learning,” in . IEEE, 2019, pp. 0305–0310.42. L. Babun, A. K. Sikder, A. Acar, and A. S. Uluagac, “Iotdots: A digital forensics frameworkfor smart environments,” arXiv preprint arXiv:1809.00745 , 2018.43. J. Kennedy and R. Eberhart, “Particle swarm optimization (pso),” in

Proc. IEEE InternationalConference on Neural Networks, Perth, Australia , 1995, pp. 1942–1948.44. F. Marini and B. Walczak, “Particle swarm optimization (pso). a tutorial,”

Chemometrics andIntelligent Laboratory Systems , vol. 149, pp. 153–165, 2015.45. D. Wang, D. Tan, and L. Liu, “Particle swarm optimization algorithm: an overview,”

SoftComputing , vol. 22, no. 2, pp. 387–408, 2018.46. Y. Shi and R. Eberhart, “A modiﬁed particle swarm optimizer,” in . IEEE, 1998, pp. 69–73.7. R. C. Eberhart and Y. Shi, “Tracking and optimizing dynamic systems with particle swarms,”in

Proceedings of the 2001 Congress on Evolutionary Computation (IEEE Cat. No. 01TH8546) ,vol. 1. IEEE, 2001, pp. 94–100.48. R. Eberhart, P. Simpson, and R. Dobbins,

Computational intelligence PC tools . AcademicPress Professional, Inc., 1996.49. M. Clerc, “The swarm and the queen: towards a deterministic and adaptive particle swarmoptimization,” in

Proceedings of the 1999 congress on evolutionary computation-CEC99 (Cat.No. 99TH8406) , vol. 3. IEEE, 1999, pp. 1951–1957.50. R. C. Eberhart and Y. Shi, “Comparing inertia weights and constriction factors in particleswarm optimization,” in

Proceedings of the 2000 congress on evolutionary computation. CEC00(Cat. No. 00TH8512) , vol. 1. IEEE, 2000, pp. 84–88.51. J. Kennedy and R. C. Eberhart, “A discrete binary version of the particle swarm algorithm,”in , vol. 5. IEEE, 1997, pp. 4104–4108.52. F. Van den Bergh and A. P. Engelbrecht, “A cooperative approach to particle swarm opti-mization,”

IEEE transactions on evolutionary computation , vol. 8, no. 3, pp. 225–239, 2004.53. R. Mendes, J. Kennedy, and J. Neves, “The fully informed particle swarm: simpler, maybebetter,”

IEEE transactions on evolutionary computation

Future Generation Computer Systems

Digital Investigation

IECON 2015-41st Annual Conference of the IEEE Industrial Electronics Society . IEEE,2015, pp. 004 345–004 351.63. M. Couceiro and P. Ghamisi,

Particle Swarm Optimization . Cham: Springer InternationalPublishing, 2016, pp. 1–10.64. M. R. Bonyadi and Z. Michalewicz, “Particle swarm optimization for single objective contin-uous space problems: a review,” 2017.65. M. N. Ab Wahab, S. Nefti-Meziani, and A. Atyabi, “A comprehensive review of swarm opti-mization algorithms,”

PloS one , vol. 10, no. 5, p. e0122827, 2015.66. Z. Zhang and M. Sabuncu, “Generalized cross entropy loss for training deep neural networkswith noisy labels,” in

Advances in neural information processing systems , 2018, pp. 8778–8788.67. Node-red tool. [Online]. Available: https://nodered.org/68. M. Claesen, J. Simm, D. Popovic, and B. Moor, “Hyperparameter tuning in python usingoptunity,” in

Proceedings of the International Workshop on Technical Computing for MachineLearning and Mathematical Engineering , vol. 1, 2014, p. 3.69. O. I. Abiodun, A. Jantan, A. E. Omolara, K. V. Dada, N. A. Mohamed, and H. Arshad,“State-of-the-art in artiﬁcial neural network applications: A survey,”

Heliyon , vol. 4, no. 11,p. e00938, 2018.70. M. S. Mahdavinejad, M. Rezvan, M. Barekatain, P. Adibi, P. Barnaghi, and A. P. Sheth,“Machine learning for internet of things data analysis: A survey,”

Digital Communicationsand Networks , vol. 4, no. 3, pp. 161–175, 2018. uthors

Nickolaos Koroniotis is a PhD student at UNSW Canberra. He received hisBachelors in Informatics and Telematics in 2014 and his Masters in Web Engineer-ing and Applications in 2016. He enrolled in UNSW Canberra to initiate his PhDstudies in February 2017 in the ﬁeld of Cyber security with a particular interest inNetwork Forensics and the IoT.