Differentially Private Collaborative Intrusion Detection Systems For VANETs
DD IFFERENTIALLY P RIVATE C OLLABORATIVE I NTRUSION D ETECTION S YSTEMS F OR VANET S Tao Zhang
Department of Electrical and Computer EngineeringNew York University [email protected]
Quanyan Zhu
Department of Electrical and Computer EngineeringNew York University [email protected] A BSTRACT
Vehicular ad hoc network (VANET) is an enabling technology in modern transportation systems forproviding safety and valuable information, and yet vulnerable to a number of attacks from passiveeavesdropping to active interfering. Intrusion detection systems (IDSs) are important devices thatcan mitigate the threats by detecting malicious behaviors. Furthermore, the collaborations amongvehicles in VANETs can improve the detection accuracy by communicating their experiences betweennodes. To this end, distributed machine learning is a suitable framework for the design of scalableand implementable collaborative detection algorithms over VANETs. One fundamental barrier tocollaborative learning is the privacy concern as nodes exchange data among them. A malicious nodecan obtain sensitive information of other nodes by inferring from the observed data. In this paper, wepropose a privacy-preserving machine-learning based collaborative IDS (PML-CIDS) for VANETs.The proposed algorithm employs the alternating direction method of multipliers (ADMM) to a classof empirical risk minimization (ERM) problems and trains a classifier to detect the intrusions inthe VANETs. We use the differential privacy to capture the privacy notation of the PML-CIDS andpropose a method of dual variable perturbation to provide dynamic differential privacy. We analyzetheoretical performance and characterize the fundamental tradeoff between the security and privacy ofthe PML-CIDS. We also conduct numerical experiments using the NSL-KDD dataset to corroboratethe results on the detection accuracy, security-privacy tradeoffs, and design. K eywords Privacy · Differential privacy · Cybersecurity · VANET · Intrusion detection
With a growing number of vehicles on road and the rapid development of autonomous vehicles, road safety becomes anincreasingly important issue. Vehicular ad hoc network (VANET) provides a communication system that enables thedissemination of safety-related information, traffic management, navigation, and road services. However, it is knownthat VANETs are vulnerable to a number of attacks from passive eavesdropping to active interfering [1]. For example,an attacker can eavesdrop and log the messages of other vehicles, and replay them to access specific resources such astoll services. An attacker can intrude a specific vehicle, impersonate its identity, and send out false warnings that candisrupt the highway traffic [1].Intrusion detection plays an important role in mitigating the threat of VANETs by using signature -based and/or anomaly -based approaches to detect adversarial behaviors [2]. Among many architectures of IDSs, the collaborativeIDSs (CIDSs) have been proposed to enable the sharing of detection knowledge about known and unknown attacks andincrease detection accuracy [3, 4, 5]. Distributed machine learning algorithms provide an appropriate framework forCIDSs to classify adversarial behaviors using local datasets and share knowledge to increase the detection accuracy.In this paper, we consider the network-level intrusion attacks on computer system [6, 7] and take advantage of thecollaborative nature of the VANETs and design a system architecture of a distributed machine-learning based CIDSover a VANET. The CIDS enables each vehicle to utilize the knowledge of the labeled training data of other vehicles;thus, it boosts the training data size for each vehicle without actually burdening the storage capacity of each vehicle. a r X i v : . [ c s . CR ] M a y PREPRINT - M AY
5, 2020Also, the laborious task of collecting labeled data can be distributed to all the vehicles in a VANET, thereby reducing theworkload of each vehicle. Moreover, the CIDS enables the vehicles to share knowledge of each other without directlyexchanging the training data. In addition, the CIDS provides the scalability of the training data processing and improvesthe quality of decision-making, while reducing the computational cost. The alternating direction method of multipliers (ADMM) [8] is one suitable approach to decentralizing the machine learning problem over a network that allows nodesover the network to share their classification results and yields the optimal classifier achieved under the centralizedlearning. Despite the distributed feature of the learning algorithm, the data communications between different vehiclescan create serious privacy concerns of the training data in each vehicle when an adversary can observe the outcome ofthe learning and extract the sensitive information of the training data of each vehicle. The adversary can either be avehicle of the VANET which observes its neighboring vehicles or malicious outsider who can observe the outputs oflearning.The lack of privacy protection mechanism often creates barriers for information sharing and disincentives for nodes toachieve collaboration. Therefore, a privacy-preserving mechanism is important to protect the training data privacy overthe network and achieve an effective CIDS. Differential privacy proposed in [9] has been a well-defined concept thatcan provide a strong privacy guarantee by which a change of any single entry of the dataset can only slightly change thedistribution of the responses of the dataset.Therefore, this work proposes a privacy-preserving machine-learning based collaborative IDS (PML-CIDS) for theVANET. We first employ ADMM to construct a distributed empirical risk minimization (ERM) problem over a VANETso that a classifier can be trained in a decentralized fashion to detect whether an activity is normal or attack. We extendthe differential privacy to dynamic differential privacy to capture the privacy notation in the distributed machine learningof the CIDS, and propose a privacy-preserving approach, dual variable perturbation (DVP). We also investigate theperformance of the DVP and characterize the fundamental tradeoff between security and privacy of the PML-CIDSby formulating convex optimization problems and conduct numerical experiments based on the NSL-KDD dataset todemonstrate the optimal design of the privacy mechanism. The main contributions of this paper are summarized asfollows:(i) We propose a machine-learning-based CIDS architecture to enable the collaborative information exchange andknowledge sharing in VANETs.(ii) We use ADMM to capture the distributed nature of a VANET and construct a collaborative learning over aVANET based on a regularized ERM algorithm.(iii) We develop the DVP method to perturb the dual variables before minimizing the augmented Lagrange functionat each ADMM iteration. The DVP is shown to guarantee dynamic differential privacy in the collaborativelearning of the CIDS for a VANET.(iv) We investigate the theoretical performance of the DVP, which is measured by the minimum training data sizerequired to achieve a low error.(v) We provide a design principle to find the optimal value of the privacy parameter by solving an optimizationproblem to manage the tradeoff between security and privacy of a VANET.
Many works have studied various architectures of intrusion detection systems that are well-suited to MANET [3]. Mostarchitectures for MANET can be classified into three categories. The first is the distributed and cooperative IDS, whichcaptures the distributed nature of MANET that has the potential for constructing cooperations over the network. Forexample, Zhang and Lee in [10] have utilized this nature of MANET and constructed a model for a distributed andcooperative IDS. Also, Albers et al. have proposed a collaborative IDS based on local IDS by using mobile agents in[11]. The local IDS is implemented on each node of the MANET for local node-based security concerns, which can beextended to deal with the global security issues by establishing a collaboration among local IDSs over the MANET. Thesecond category is hierarchical IDS model that extends the distributed and cooperative architectures. In [12], Sterne etal. have designed a dynamic hierarchical IDS using multilevel clustering. The third architecture uses the concept ofmobile agents, which can move through the large network. In this type of framework, each mobile agent is assignedto work on a single specific task; then one or multiple mobile agents are distributed into each node in the MANET.Previous research includes the work of Kachirski and Guha in [13] that has proposed distributed IDSs using multiplesensors based on mobile agent technology; and thus, the workload is distributed by separating functional tasks andassigning the tasks to different agents.Machine learning and data mining for IDSs have also been studied in the literature. These techniques enable the IDSto continuously learn attacks and their behaviors, enhance the knowledge of the security system, make connections2
PREPRINT - M AY
5, 2020
Figure 1. A VANET scenario: TMC: traffic management center; V2V: vehicle-to-vehicle communications; V2I: vehicle-to-infrastructure communications; I2I: infrastructure-to-infrastructure communications. Each vehicle is equipped with an OBU and anAU. between suspicious events, and predict the occurrence of an attack. Researchers have studied the unsupervised learningsuch as the technique of clustering, which is an unsupervised pattern discovery method, in IDSs. There are severalapproaches for clustering the unlabeled data; for example, Blowers and Williams [14] have applied a density-basedspatial clustering of applications with noise clustering algorithm to group normal versus anomalous network packets.Other clustering based work includes hierarchical clustering [15] and K-means [16]. There is also literature on the IDSwith supervised learning such as support vector machine [17]. For example, Wagner et al. [18] have applied one-classSVM classifier and used a new window kernel to find the anomaly based on time position of the data. Other methodsusing supervised learning include decision trees [19, 20, 21], artificial neural networks [22, 23], and sequential dataaggregation [24, 25]. There also have been works on intrusion-prediction based detection using non-machine-learningtechniques [26, 27]. For example, Nidhal et al. [28] have designed a game-theoretic intrusion detection approach forVANET. The game-based model can predict a possible future denial-of-service attack on the monitored nodes.In the field of differential privacy, there is a number of works on applying differential privacy to machine learning[29, 30, 31, 32, 33]. Kasiviswanathan et al. [29], for instance, have driven a general method for probabilisticallyapproximately correct learning. A body of literature has studied the tradeoff between the privacy and the performanceof machine learning while exploring the theory of differential privacy (e.g., [9, 34, 35]). Also, an increasing numberof researchers focus on the distributed differential privacy. Eigner and Maffei have developed the framework forthe automated verification of distributed differential privacy in [36] to enforce the distributed differential privacy incryptographic protocol implementations. Han et al. [37] have proposed a differentially private algorithm to solvea distributed constrained optimization based on distributed projected gradient descent to protect the privacy of theconstraint set. Hale et al. [38] have used a cloud computer to perform differentially private computations so that thebroadcasts of the results to each agent over the network do not leak the private state of each agent. In this paper, wehave developed a collaborative IDS using distributed machine learning and resolve the barrier of privacy issues byproposing the concept of dynamic differential privacy to protect the privacy of the training dataset used in the learning.
The rest of the paper is structured as follows. Section 2 describes the PML-CIDS architecture. Section 3 presents themodel of the collaborative learning over a VANET for IDS. The ADMM approach is used to decentralize a centralizedERM problem that models the collaborative learning in the VANET. We also describe the privacy concerns associatedwith the ADMM-based collaborative learning and define the dynamic differential privacy. Section 4 proposes the DVPalgorithm to provide dynamic differential privacy. Then, we study the performance of the DVP algorithm in Section5. Section 6 shows numerical experiments to corroborate the theoretical results and the optimal design principle to atradeoff between security and privacy. Finally, Section 7 presents the concluding remarks and future research directions.3
PREPRINT - M AY
5, 2020
Algorithm 1
PML-CIDS
Input:
Real-time VANET system data: Local audit data flow and activity logs.
Step 1:
The pre-processing engine collects and pre-processes the real-time VANET system data, by numericaltransformation, features selection, and data normalization. if The classifier needs update thenStep 2:
The
P-CML engine is initiated and local training dataset is loaded. And updated classifier is obtained.
Step 3:
The local detection engine uses the newly updated classifier to analyze the real-time VANET system data.If any activities are classified as intrusions, the local detection engine triggers the alarm. elseStep 2:
The local detection engine uses the current classifier to analyze the real-time VANET system data andtriggers the alarm when any activities are classified as intrusions. end if
In this section, we describe the architecture of the proposed PML-CIDS which includes multiple building blocksfor VANETs. Illustrated in Figure 1, a general VANET consists of on-board units (OBU), application units (AU),and roadside units (RSU). The communication between OBUs (vehicle-to-vehicle), or between an OBU and an RSU(vehicle-to-infrastructure) is based on wireless access in-vehicle environment (WAVE) [3]. The RSUs can also connectto other infrastructures such as other RSUs and traffic management center, and the communications between them(infrastructure-to-infrastructure) are through other wireless technology. Each vehicle is equipped with an OBU and oneor multiple AUs. It also has a set of sensors to collect information and use the OBU to exchange information with otherOBUs or RSUs. Details about the three main components of the VANET architecture are presented in the Appendix A.1for interested readers.Each vehicle is equipped with one local PML-CIDS agent as shown in Figure 2 to monitor its local activities includingthe ones in the AU and the communications via the OBU. Conceptually, the collaborative system consists of three maincomponents, namely, pre-processing engine, a local detection engine, and privacy-preserving collaborative machinelearning (P-CML) engine. The logical flow of a PML-CIDS is illustrated in Algorithm 1. The pre-processing enginegathers and pre-processes the real-time VANET system data that describe the system activities in a vehicle. Thepre-processed system data is then analyzed by the local detection engine using classification techniques. If the userof the vehicle requires the current classifier to be updated, then the P-CML engine is initiated. The local detectionengine uses the newly retained classifier to analyze the system data. Otherwise, the current classifier is used in theclassification of intrusions. If any intrusion is classified, the alarm is triggered. Each component of the PML-CIDS iselaborated further in Appendix A.2. One essential component of the CIDS is the P-CML engine which is composed ofthe collaborative communication (CC) engine, distributed local learning (DLL), and privacy-preserving (PP) mechanism.The details of these building blocks will be described in detail Section 3.
Figure 2. Architecture of PML-CIDS: The Pre-processing engine collects and pre-processes the local audit data flow. The localdetection engine then analyzes the pre-processed data using a classifier. If the user of the vehicle requires an update of the classifier,the P-CML engine is initiated. After collaborative learning, the updated classifier will be used in the intrusion detection. PREPRINT - M AY
5, 2020
In this section, we describe the CC engine and the DLL of the P-CML engine in the PML-CIDS. We first model themachine learning by a centralized regularized empirical risk minimization (ERM) problem, which is then decentralizedby the ADMM approach. The privacy concerns are then described, and a definition of dynamic differential privacy isprovided. In our model, the vehicles and infrastructures are treated equally except that the infrastructures are staticand have more data processing capacity. Therefore, without loss of generality, in the rest of paper, we focus on thevehicle-to-vehicle communications.
Figure 3. Connected networks: The colored nodes represent the vehicles; v is the local center of the one-hop neighborhoodcomposed of i , j , and k ; v only communicates with i , j , and k . Consider a connected VANET, which consists of P vehicles, described by an undirected graph G ( V , E ) as shownin Figure 3 at time t , with the set of vehicles V = { , , , ..., P } , and a set of edges E denoting the links betweenconnected vehicles. In general, the graph can change over time as the nodes move. Here we introduce the frameworkwith a fixed topology, and it can be easily extended to dynamic regimes. A particular vehicle v ∈ V only exchangesinformation between its neighboring vehicle w ∈ N v , where N v is the set of all neighboring vehicles of v . Eachvehicle stores a labeled training dataset D v = { ( x iv , y iv ) ⊂ X × Y : i = , , · · · , n v } of size n v , where x iv ∈ X ⊆ R d and y iv ∈ Y : = {− , } are the data vector and the corresponding label, respectively. The entire network therefore has a setof data ˆ D = (cid:83) v ∈ V D v . The training dataset D v of v ∈ V contains data points describing the VANET system activitiessuch as user and application activities and communication activities through the OBUs; each data point is labeled as anintrusion (1) or a normal activity ( − f : X → Y using all available data ˆ D that enable all vehicles in the ad hoc network to classify any input data x (cid:48) (i.e.,data flow collected and pre-processed by pre-processing engine) to a label y (cid:48) ∈ {− , } , where − Let Z ( f | ˆ D ) be the centralized objective function of a regularized empirical risk minimization problem (C-ERM). Thus,the C-ERM problem can be defined as:min f Z ( f | ˆ D ) : = C n v P ∑ v = n v ∑ i = ˆ L ( y iv , f T x iv ) + κ R ( f ) , (1)where C ≤ n v is a regularization parameter and κ > L ( y iv , f T x iv ) : R × R d × R d → R is the loss function that measures the quality of the trained classifier. In this work,we focus on the specific loss function ˆ L ( y iv , f T x iv ) = L ( y iv f T x iv ) . The regularizer function R ( f ) in (1) is used to5 PREPRINT - M AY
5, 2020prevent overfitting. Suppose that ˆ D is available to the fusion center vehicle, a global classifier f : X → Y is chosen byoptimizing the C-ERM. To solve the problem by ADMM, we first decentralize the C-ERM problem by introducing the decision variables { f v } Pv = ; then, vehicle v determines its own classifier f v . We impose consensus constraints f = f = ... = f P toguarantee the global consistency of the classifiers. Let { s vw } be the auxiliary variables to decouple f v of the vehicle v from its neighbors w ∈ N v in the VANET. Then, the consensus-based reformulation of C-ERM becomesmin { f v } Pv = Z : = C n v P ∑ v = n v ∑ i = L ( y iv f Tv x iv ) + P P ∑ v = κ R ( f v ) , s.t. f v = s vw , s vw = f v , v = , ..., P , w ∈ N v , (2)where Z ( { f v } v ∈ V | ˆ D ) is the reformulated objective as a function of { f v } Pv = . According to Lemma 1 in [39], if { f v } Pv = presents a feasible solution of (2) and the network is connected, then problems (1) and (2) are equivalent, i.e., f = f v , forall v = , ..., P , where f is a feasible solution of C-ERM. Let ρ = P κ . Problem (2) can be solved in a distributed fashionusing ADMM with each vehicle v ∈ V optimizing the following distributed regularized empirical risk minimizationproblem (D-ERM): Z v ( f v | D v ) : = C n v n v ∑ i = L ( y iv f Tv x iv ) + ρ R ( f v ) . (3)The augmented Lagrange function associated with the D-ERM is: L Dv ( f v , s vw , λ kvw ) = Z v + ∑ i ∈ N v (cid:0) λ avi (cid:1) T ( f v − s vi ) + ∑ i ∈ N v (cid:0) λ bvi (cid:1) T ( s vi − f i )+ η ∑ i ∈ N v ( (cid:107) f v − s vi (cid:107) + (cid:107) s vi − f i (cid:107) ) . (4)Therefore, the distributed iterative procedures to solve (3) are: f v ( t + ) = arg min f v L Dv (cid:0) f v , s vw ( t ) , λ kvw ( t ) (cid:1) , (5) s vw ( t + ) = arg min s vw L Dv (cid:0) f v ( t + ) , s vw , λ kvw ( t ) (cid:1) , (6) λ avw ( t + ) = λ avw ( t ) + η ( f v ( t + ) − s vw ( t + )) , v ∈ V , w ∈ N v , (7) λ bvw ( t + ) = λ bvw ( t ) + η ( s vw ( t + ) − f v ( t + )) , v ∈ V , w ∈ N v . (8)Here, s vw ( t + ) in (6) can be found in closed form because the cost in (6) is linear-quadratic in s vw ( t + ) [39]. Bysubstituting the closed-form solution, we can eliminate s vw ( t + ) from L Dv ; this approach makes it possible to simplifythe iterative procedures (5) to (8). Indeed, according to Lemma 3 in [39], we can further simplify the distributediterative procedures by initializing the dual variables λ kvw = d × d and combining the two sets of dual variables into oneas λ v ( t ) = ∑ w ∈ N v λ kvw , v ∈ V , w ∈ N v , k = a , b . Then, we can combine (7) and (8) into one update. We simplify (5)-(8)by introducing the following. Let L Nv ( t ) be the short-hand notation of L Nv ( { f v } , { f v ( t ) } , { λ v ( t ) } ) as : L Nv ( t ) : = C n v n v ∑ i = L ( y iv f Tv x iv ) + ρ R ( f v ) + λ v ( t ) T f v + η ∑ i ∈ N v (cid:107) f v − ( f v ( t ) + f i ( t )) (cid:107) , (9)where (cid:107) · (cid:107) denotes the l norm throughout this paper.The ADMM iterative procedures (5)-(8) are reduced to f v ( t + ) = arg min f v L Nv ( f v , f v ( t ) , λ v ( t )) , (10) λ v ( t + ) = λ v ( t ) + η ∑ w ∈ N v [ f v ( t + ) − f w ( t + )] . (11)6 PREPRINT - M AY
5, 2020
Algorithm 2
Distributed ERM over VANET
Required:
Randomly initialize f v , λ v = d × for every v ∈ V Input: ˆ D for t = , , , , ... dofor v = to P do Compute f v ( t + ) via (10). end forfor v = to P do Broadcast f v ( t + ) to all neighboring vehicles w ∈ N v . end forfor p = to P do Compute λ v ( t + ) via (11). end forend forOutput: f ∗ = f v , for all v ∈ V .Algorithm 2 summarizes the (non-private) distributed ERM over a VANET. At iteration t +
1, vehicle v updates its local f v ( t ) through (10). Next, v broadcasts the latest f v ( t + ) to all its neighboring vehicles w ∈ N v . When each vehicle hasupdated λ v ( t + ) via (11), iteration t + v ∈ V only updatesits own f v ( t ) and λ v ( t ) and the only information exchanged between neighboring vehicles is f v ( t ) ; thus, direct datasharing is avoided. There are several methods to solve (10). For example, projected gradient method, Newton method,and Broyden-Fletcher-Goldfarb-Shanno method [40] that approximates the Newton method, to name a few. In thisdistributed algorithm, each vehicle solves a minimization problem per iteration using its local training dataset. The onlyinformation in the message transmitted by the OBUs between neighboring vehicles is the value of f v ( t ) .ADMM-based distributed machine learning has benefits due to its high scalability. It also provides a certain degree ofprivacy since vehicles do not share training data directly. However, the privacy issue arises when powerful adversariescan make intelligent inferences at each step of the collaborative learning and extract the privacy information containedin the training dataset based on their observation of the learning output of each vehicle. Simple anonymization orconventional sanitization is not sufficient to address the privacy issue as mentioned in the introduction. In the nextsubsection, we will discuss the privacy concerns about the training data, and propose differential privacy solutions. As mentioned in the last subsection, the data stored at each vehicle is not exchanged during the entire ADMM algorithm;however, the potential privacy risk still exists. Suppose that the dataset D v stored at vehicle v contains sensitiveinformation in data point ( x s , y s ) that is not allowed to be known by anyone else. Consider the worst-case scenario whenthe adversaries know every data point of training data except ( x s , y s ) . There exist risks that the information about thesensitive data can be extracted by observing the output of the non-private ADMM-based distributed learning algorithmwhen the output is transmitted by OBU.In this paper, we consider a linear classifier f v . The classifier f v that minimizes the ERM is a linear combination ofdata points with the labels, which constitute the entire or a subset of the training dataset, near the decision boundary.Let these data-label pairs constitute a subset of the training dataset D , which is denoted as Sb ( D ) . Let A ( · ) : R d → R represent Algorithm 2 with the output f v = A ( D v ) given the dataset D v . Let D (cid:48) v be any dataset such that D v and D (cid:48) v differ by only one data point. Let ( x s , y s ) ⊂ D v and ( x (cid:48) s , y (cid:48) s ) ⊂ D (cid:48) v be the only pair of data points that are different,i.e., ( x s , y s ) (cid:54) = ( x (cid:48) s , y (cid:48) s ) . Suppose f = A ( D (cid:48) v ) . If ( x s , y s ) ∈ Sb ( D v ) and ( x (cid:48) s , y (cid:48) s ) ∈ Sb ( D (cid:48) v ) , then P ( A ( D (cid:48) v ) = f ) = P ( A ( D v )= f v ) P ( A ( D (cid:48) v )= f ) = ∞ ).Before describing the attack model, we first introduce the following notations. Let A r : R d → R be the randomizedversion of Algorithm 2, and let { f ∗ v } v ∈ V be the output of A . It has been proved (e.g., [39]) that if the number ofiterations t → ∞ , f ∗ = f ∗ = · · · = f ∗ P = f ∗ , where f ∗ is the optimal solution of C-ERM. Since A is deterministic, theoutput { f ∗ v } v ∈ V is deterministic. In the randomized algorithm, the vehicle v optimizes its local regularized empiricalrisk using its own training dataset. Let A rtv be the vehicle- v -dependent stochastic sub-algorithm of A r at iteration t , andlet f v ( t ) be the output of A rtv ( D v ) at iteration t with dataset D v . Therefore, f v ( t ) is stochastic at each t .We consider the following attack model. The adversary can access the output at every iteration as well as the final output.This type of adversary aims to obtain the sensitive information contained in the private data point in the training datasetby observing the output f v ( t ) or f ∗ v for all v ∈ V at each iteration t , not limited to the first iteration. We protect the7 PREPRINT - M AY
5, 2020privacy of distributed learning over a VANET using the definition of differential privacy proposed in [9]. Specifically,we require that a change of any single data point in the dataset might only change the distribution of the output of thealgorithm slightly. It can be realized by adding randomness to the output of the algorithm. Recent advances in theprivacy-preserving machine learning techniques are not directly applicable since ADMM algorithms are iterative anddynamic; hence we need to extend the notion of privacy to dynamically differential privacy. To protect the privacyof training data against the adversary in the collaborative learning of a VANET, we propose the concept of dynamicdifferential privacy, which enables the D-ERM to be privacy-preserving at every stage of learning.
Definition 1. (Dynamic α ( t ) -Differential Privacy (DDP)) Consider a network of P nodes V = { , , ..., P } , and eachnode v has a training dataset D v , and ˆ D = (cid:83) v ∈ V D v . Let A r : R d → R be a randomized version of Algorithm 2. Let α ( t ) = ( α ( t ) , α ( t ) , ..., α P ( t )) ∈ R P + , where α v ( t ) ∈ R + is the privacy parameter of node v at iteration t. Let A rtv bethe node-v-dependent sub-algorithm of A r , which corresponds to an ADMM iteration at t that outputs f v ( t ) . Let D (cid:48) v beany dataset with H d ( D (cid:48) v , D v ) = , and g v ( t ) = A rtv ( D (cid:48) v ) . We say that the algorithm A r is dynamically α v ( t ) -differentiallyprivate (DDP) if for any dataset D (cid:48) v , and for all v ∈ V that can be observed by the adversaries, and for all possible setsof the outcomes S ⊆ R , the following inequality holds: Pr [ f v ( t ) ∈ S ] ≤ e α v ( t ) · Pr [ g v ( t ) ∈ S ] , (12) for all t ∈ Z during a learning process. The probability is taken with respect to f v ( t ) , the output of A rtv at every stage t.The algorithm A r is called dynamically α ( t ) -differentially private if the above conditions are satisfied. Definition 1 provides a suitable differential privacy concept for the adversary in the collaborative learning of a VANET.For DDP algorithm, the adversaries cannot extract additional information of the private data by observing the f v ( t ) atany vehicle v any iteration t . As mentioned above, Algorithm 2 is not DDP since P ( A ( D v )= f v ) P ( A ( D (cid:48) v )= f ) → ∞ . Please note thatthe optimization at each iteration in ADMM-based learning is uncoupled from each other different iteration. Also,the optimization at each vehicle is uncoupled from each other. These properties of ADMM make it possible to treatthe privacy of each vehicle each iteration independently. In the definition of DDP, the strength of privacy of vehicle v iteration t totally depends on the value of α v ( t ) chosen at t , which is independent of the number of α w ( t (cid:48) ) for all w (cid:54) = v and t (cid:48) (cid:54) = t . Therefore, the DDP is also independent of the number of iterations. Since each iteration is private, thereare no opportunities for privacy leakage in previous iterations the adversaries can take advantage of to extract moreinformation in later iterations. Figure 4. Illustration of DVP during intermediate iterations. The perturbed β v participates in PP mechanism, described by (14). Asa result, the output f v at each iteration is a random variable, and the transmission of f v is differentially private. The evil red facerepresents the adversary and the red lighting refer to the possible privacy leakage positions. In the previous section, we have defined a dynamic differential privacy that can capture the notation of data privacyin the collaborative learning over a VANET. In this section, we propose an approach for the privacy-preservingmechanism based on the definition of dynamic differential privacy:
Dual Variable Perturbation (DVP), and describe themathematical models of all three components of the P-CML, namely, the PP mechanism, the DLL, and the CC engine.DVP is proved to be DDP by adding appropriate noise to the deterministic algorithms if the following assumptions aresatisfied:
Assumption 1.
The loss function L is strictly convex and doubly differentiable of f with | L (cid:48) | ≤ | L (cid:48)(cid:48) | ≤ C ,where C is a constant. Both L and L (cid:48) are continuous. 8 PREPRINT - M AY
5, 2020
Algorithm 3
Dual Variable Perturbation Required:
Randomly initialize f v , λ v = d × for every v ∈ V Input: ˆ D , { [ α v ( ) , α v ( ) , ... ] } Pv = for t = , , , , ... do for v = to P do Let ˆ α v = α v ( t ) − ln (cid:16) + C nvC (cid:0) ρ + η N v (cid:1) (cid:17) . if ˆ α v > then Φ = else Φ = C nvC ( e α v ( t ) / − ) − ρ − η N v and ˆ α v = α v ( t ) /
2, where N v is the number of neighboring vehicles of v . end if Draw noise ε v ( t ) according to K v ( ε ) ∼ e − ζ v ( t ) (cid:107) ε (cid:107) with ζ v ( t ) = ˆ α v . PP:
Compute β v ( t + ) via (14). DLL Part 1:
Compute f v ( t + ) via (15) with augmented Lagrange function as (13). end for for v = to P do CC:
Broadcast f v ( t + ) to all neighboring vehicles w ∈ N v . end for for v = to P do DLL Part 2:
Compute λ v ( t + ) via (16). end for end for Output: { f ∗ v } Pv = . Assumption 2.
The regularizer function R ( · ) is continuous, differentiable, and 1-strongly convex. Both R ( · ) and ∇ R ( · ) are continuous. Assumption 3.
We assume that (cid:107) x iv (cid:107)≤
1. Since y iv ∈ {− , } , | y iv | = { λ v ( t ) } Pv = with a random noise vector ε v ( t ) ∈ R d with the probability density function K v ( ε ) ∼ e − ζ v ( t ) (cid:107) ε (cid:107) , where ζ v ( t ) is a function of α v ( t ) . In this approach, we add an additional term Φ (cid:107) f v (cid:107) to the objective function(3) to make sure that the objective function associated with (13) is at least Φ -strongly convex. Each iteration startswith perturbing the dual variable λ v ( t ) updated in the last iteration to a new variable β ( t ) = λ v ( t ) + C n v ε v ( t ) . Then,the corresponding vehicle- v -based augmented Lagrange function L Nv ( t ) becomes L dualv (cid:0) f v , f v ( t ) , β v ( t + ) , { f i ( t ) } i ∈ N v (cid:1) .Let L dualv ( t ) be a short-hand notation and we have: L dualv ( t ) = Z v ( f v | D v ) + Φ (cid:107) f v (cid:107) + β v ( t + ) T f v + η ∑ i ∈ N v (cid:107) f v − ( f v ( t ) + f i ( t )) (cid:107) . (13)Thus, the randomness caused by adding noise ε v ( t ) randomizes the minimizer of L dualv ( t ) . Let C LK denote the K -th(randomized) collaboration. Suppose each C LK includes T K iterations ( T K can be varying in K ). In our model, the t -thiteration of C LK is varying in K for all t ∈ { , · · · , T K } . This is because that the mechanism at t involves f v ( t − ) and β v ( t − ) , whose values that were updated in the last iteration t − t − K .Now we can model three components of the P-CML as: • PP mechanism: dual variable λ v ( t ) is perturbed using Laplace noise ε v ( t + ) , β v ( t + ) = λ v ( t ) + C n v ε v ( t + ) . (14) • DLL part 1: f v ( t + ) is updated by minimizing L dualv ( t ) , f v ( t + ) = arg min f v L dualv ( t ) . (15)9 PREPRINT - M AY
5, 2020 • CC engine:
Broadcast f v ( t + ) to all neighboring vehicles w ∈ N v . • DLL part 2: λ v ( t + ) is updated using all the { f w ( t + ) } w ∈ N v from the neighboring vehicles and f v ( t + ) , λ v ( t + ) = λ v ( t ) + η ∑ w ∈ N v [ f v ( t + ) − f w ( t + )] . (16)The iterations (14)-(16) are summarized in Algorithm 3. Specifically, we have introduced an additional privacyparameter ˆ α v = α v ( t ) − (cid:16) + C nvC (cid:0) ρ + η N v (cid:1) (cid:17) , where N v is the number of neighboring vehicles of v . ˆ α v is required inthe proof of Theorem 1. Two cases of ˆ α v are necessary to find the upper bound of the ratio of the Jacobian matrices ofthe transformation from f v ( t ) to ε v ( t ) given different datasets (see details in Appendix A in [41]). Figure 4 illustrateseach iteration of Algorithm 3. After the P-CML engine has established a collaboration with neighboring vehicles over aVANET, the ADMM iterations begin. Initially, the DLL at each vehicle generates an initial λ v ( ) = f v ( ) . f v ( ) is shared with neighboring vehicles via the CC engine. Every vehicle v ∈ V determines its own valueof ρ and updates its local parameters β v ( t ) , f v ( t ) and λ v ( t ) at time t . At iteration t +
1, the PP mechanism of v perturbsthe λ v ( t − ) by a Laplace noise ε v ( t ) to generate β v ( t ) as shown in (14). Then, the DLL uses β v ( t ) , { f w ( t − ) } w ∈ N v ,and the local labeled training dataset D v to update the f v ( t ) as shown in (15). The CC engine transmits f v ( t ) to all theneighboring vehicles, and at the same time, it receives { f w ( t ) } w ∈ N v . Each iteration resumes after the DLL has updatedthe λ v ( t ) using { f w ( t ) } w ∈ N v , f v ( t ) , and λ v ( t − ) according to (16). After the P-CML terminates, it transmits the finalupdated classifier f ∗ v to the local detection engine for intrusion detection. The privacy guarantee of DVP is summarizedin Theorem 1. Theorem 1.
Under Assumption 1, 2 and 3, Algorithm 3 solving D-ERM is dynamically α ( t ) -differentially private with α ( t ) = ( α ( t ) , α ( t ) , ..., α P ( t )) , where α v ( t ) is chosen by each vehicle v ∈ V at time t. Let Q ( f v ( t ) | D v ) and Q ( f v ( t ) | D (cid:48) v ) be the probability density functions of f v ( t ) given dataset D v and D (cid:48) v , respectively, with H d ( D v , D (cid:48) v ) = . The ratio ofconditional probabilities of f v ( t ) is bounded as follows:Q ( f v ( t ) | D v ) Q ( f v ( t ) | D (cid:48) v ) ≤ e α v ( t ) . (17) Proof.
See Appendix A in [32].
Remark 1.
In practice, the VANET topology frequently changes due to the mobility of the vehicles. The change oftopology can be caused by the changes of the position, the speed, and the number of vehicles. Therefore, it is possiblethat the VANET topology changes during the collaborative learning. Section 3.2 has explained the independence ofdynamical differential privacy. Since the dynamic differential privacy of the training dataset at v is independent of otheriterations and the number of iterations, we can conclude that the DVP algorithm is independent of the speed of thevehicles. Also, the privacy of v is independent of the activities at other vehicles; thus the DVP at v is also independentof the mobility of vehicles in the VANET. Therefore, the dynamic differential privacy and the Algorithm 3 work in thetopology-varying VANET. Let n v ( t ) represent the time-varying number of vehicles in the topology-varying VANET.Let N v ( t ) and N v ( t ) denote the time-varying set of neighboring vehicles and the number of neighboring vehicles,respectively. By substituting the time-varying n v ( t ) , N v ( t ) and N v ( t ) into equations (14)-(16), the algorithm is alsodynamically differentially private in the topology-varying VANET. Remark 2.
In this model, the learning is a continuous progress and we do not specify a time window of learningfor each vehicle v . Specifically, each v decides when to start a new collaborative learning to update the previouslyupdated classifier f v , or to stop a collaborative learning in progress and keep the newly updated classifier f v as thelatest intrusion classifier. Continuous learning is important since the training data keeps being updated. The machinelearning algorithm can benefit from the frequent changes of dataset to continuously learn different kinds of attacks andtheir behaviors and enhancing the knowledge of the security system. In this section, we discuss the performance of Algorithm 3. The training data stored at each vehicle is labeled as 1(attack) or − x (cid:48)(cid:48) as 1 (or −
1) when actually x (cid:48)(cid:48) is − l norm regularization functions such that we can train a classifier with low false positive and low falsenegative. The performance analysis is based on the following assumptions: Assumption 4.
The data-label pair { ( x vi , y vi ) } n v i = are drawn i.i.d. from a fixed but unknown probability distribution P xy ( x vi , y vi ) at each node v ∈ V . Also, there is fixed but unknown conditional probability distribution P x | y ( x vi | y vi = q ) for data points { x vi } n v i = given y vi = q , where q = − PREPRINT - M AY
5, 2020
Assumption 5. ε v ( t ) is drawn from K v ( ε ) ∼ e − ζ v ( t ) (cid:107) ε (cid:107) , with the same α v ( t ) = α ( t ) (thus the same ζ v ( t ) ) for all v ∈ V at time t ∈ Z .According to Assumption 4, we define the conditional expected loss function of the classifier f v of vehicle v , given y as:ˆ J ( f v | y ) : = C E x ∼ P x | y ( L ( y f T x ) | y ) ; thus, the corresponding conditional expected objective function ˆ Z v is ˆ Z v ( f v | y ) : = ˆ J ( f v | y ) + ρ R ( f v ) . The performance of non-private centralized ERM classification optimization has been already studied in the literature(e.g., [42, 43]). For example, Shalev et al. in [42] introduces a reference classifier f , and shows that there is a lowerbound of the training data size such that the actual (unconditional) expected loss of the l regularized support vectormachine (SVM) classifier f SVM satisfies ˆ J ( f SVM ) ≤ ˆ J + µ , where µ is the generalization error and ˆ J = ˆ J ( f ) . Thesimilar argument can be used in this work to study the accuracy of Algorithm 2 in terms of conditional expected loss.Let ˆ J x | y = q = ˆ J ( f | y = q ) . We quantify the performance of Algorithm 2 with the final output f ∗ by the minimumnumber of data points required to obtain ˆ J ( f ∗ | y = q ) ≤ ˆ J x | y = q + µ q . However, instead of focusing on only the final output f ∗ = arg min f v Z v ( f v | D v , y = q ) , for all v ∈ V , we also care aboutthe performance of the output of each iteration. Let f nonv ( t + ) = arg min f v L Nv ( t ) be the output of iteration t of the (non-private) Algorithm 2 at vehicle v . Literature has proved that the sequence { f nonv ( t ) } is bounded and converges to f ∗ astime t → ∞ (e.g., [39]). Thus, there exists a constant C nonq ( t ) at time t such that ˆ J ( f nonv ( t ) | y = q ) − ˆ J ( f ∗ | y = q ) ≤ C nonq ( t ) , and substituting it to ˆ J ( f ∗ ) ≤ ˆ J + µ q yields:ˆ J ( f nonv ( t ) | y ) ≤ ˆ J x | y = q + C nonq ( t ) + µ q . (18)As shown later in this section, the training data size depends on the (cid:107) f (cid:107) . Usually, the reference classifier is selectedwith an upper bound on (cid:107) f (cid:107) . Theorem 2 summarizes the performance analysis of Algorithm 2 based on (18). Theorem 2.
Let R ( f v ( t )) = (cid:107) f v ( t ) (cid:107) , and let f and f − such that ˆ J ( f | ) = ˆ J x | and ˆ J ( f − |− ) = ˆ J x |− , respectively,for all v ∈ V at time t, and δ q > is a positive real number for q = and − . Let D v = (cid:110) ( x iv , y iv ) ⊂ R d × {− , } (cid:111) be the dataset of vehicle v ∈ V . Let D ( ) v and D ( − ) v be the dataset containing all the data points x vi labeled as and − , respectively; and let n ( ) v and n ( − ) v be the size of D ( ) v and D ( − ) v , respectively; thus D v = D ( ) v ∪ D ( − ) v , andn v = n ( ) v + n ( − ) v . Let f nonv ( t + ) = arg min f v L Nv ( f v , t | D v ) be the output of Algorithm 2. If Assumption 1 and 4 aresatisfied, then there exist two constants C ( ) and C ( − ) such that if n ( ) v and n ( − ) v satisfyn ( ) v > C ( ) (cid:32) C (cid:107) f (cid:107) ln ( δ ) µ (cid:33) , and n − v > C (cid:32) C (cid:107) f − (cid:107) ln ( δ − ) µ − (cid:33) , then f nonv ( t + ) satisfies P (cid:0) ˆ J ( f nonv ( t + ) | y = q ) ≤ ˆ J x | y = q + µ q + C nonq ( t ) (cid:1) ≥ − δ q , for all t ∈ Z + . Therefore, both false positive and false negative errors are bounded with probability at least − δ q .Proof. See Appendix D in [32].Usually, µ q ≤ y i f T x i ≤ C SVM , for i = , , ..., n SVM , the classification margin is C SVM (cid:107) f (cid:107) . Therefore, if we want to maximize the margin C SVM (cid:107) f (cid:107) , alarge value of (cid:107) f (cid:107) is required. A larger value of (cid:107) f (cid:107) is usually chosen for non-separable or small-margin problems.In the following subsection, we use the similar analysis for the performance of Algorithm 3. Similarly, Algorithm 3 solves one optimization problem minimizing L dualv ( f v , t | D v ) at each iteration t vehicle v . Supposeat iteration τ , we generate a noise term ε v ( τ ) = ε . If we fix the noise term ε v ( t (cid:48) ) = ε for all t (cid:48) > τ , then the algorithm11 PREPRINT - M AY
5, 2020becomes static starting from τ . Let Alg-2 denotes this corresponding static algorithm associated with Algorithm 3.Therefore, solving Alg-2 is equivalent to solving the optimization problem with the object function Z dualv ( f v , τ | D v , ε ) ,defined as follows: Z dualv ( f v , τ | D v , ε ) : = Z v ( f v | D v ) + C n v ε f v . Let Z dualv ( τ ) be the short-hand notation of Z dualv ( f v , τ | D v , ε ) . Note that the index τ indicates that this objective is basedon the noise ε v ( τ ) = ε generated at iteration τ of Algorithm 3. Let f (cid:48) v ( t ) and λ (cid:48) v ( t ) be the primal and dual updates,respectively, of the ADMM-based algorithm minimizing Z dualv ( τ ) at iteration t . Then, Alg-2 can be interpreted asminimizing Z dualv ( τ ) with f (cid:48) v ( ) = f v ( τ ) and λ (cid:48) v ( ) = λ v ( τ ) as initial conditions for all v ∈ V . Since Z dualv ( τ ) is real andconvex, similar to Algorithm 2, the sequence { f (cid:48) v ( t ) } is bounded and f (cid:48) v ( t ) converges to f (cid:48) ∗ v ( τ ) = arg min f (cid:48) v Z dualv ( τ ) ,which is a limit point of f (cid:48) v ( t ) . Therefore, there exists a constant C dualv , y = q ( t ) given noise term ε fixed from iteration τ ofAlgorithm 3 such that ˆ J ( f v ( τ ) | y = q ) − ˆ J ( f (cid:48)∗ ( τ ) | y = q ) ≤ C dualv , y = q ( t ) , for q = −
1. The way we analyze the performance in Theorem 2 can also be used in the case of DVP. Specifically,the performance is measured by the training data sizes, n ( ) v and n ( − ) v for data points x vi labeled by y vi = y vi = − v ∈ V required to obtainˆ J ( f v ( τ ) | y = q ) ≤ ˆ J x | y = q ( τ ) + µ q + C dualv , y = q ( τ ) , for q = −
1. We say that each f v ( τ ) is accurate with low false positive (1) or false negative ( −
1) error if it satisfiesthe above inequality. The analysis of the performance for Algorithm 3, the DVP, is summarized in Theorem 3 andCorollary 3.1.
Theorem 3.
Let R ( f ) = (cid:107) f (cid:107) , and let f v , y = ( τ ) and f v , y = − ( τ ) such that ˆ J ( f y = ( τ ) | ) = ˆ J x | y = ( τ ) and ˆ J ( f y = − ( τ ) |− ) = ˆ J x | y = − ( τ ) for all v ∈ V . Let δ and δ − be positive numbers. Let D v = D ( ) v ∪ D ( − ) v = (cid:110) ( x iv , y iv ) ⊂ R d × {− , } (cid:111) be the labeled dataset of vehicle v ∈ V , where D ( ) v and D ( − ) v are the datasets of size n ( ) v and n ( − ) v , respectively,containing all the data points x vi labeled as and − , respectively. If Assumption 1, 4 and 5 are satisfied, then thereexist two constants C ( ) and C ( − ) such that if the number of data points n ( q ) v satisfiesn ( q ) v > C ( q ) max (cid:32) max τ (cid:16) (cid:107) f v , y = q ( τ ) (cid:107) d ln ( d δ q ) µ q α v ( τ ) (cid:17) , max t (cid:16) C C (cid:107) f v , y = q ( τ ) (cid:107) µ q α v ( τ ) (cid:17) , max t (cid:16) C (cid:107) f v , y = q ( τ ) (cid:107) ln ( δ q ) µ q (cid:17)(cid:33) , then f ∗ v ( τ ) satisfies P (cid:0) ˆ J ( f ∗ v ( τ ) | y = q ) ≤ ˆ J x | y = q ( τ ) + µ q (cid:1) ≥ − δ q , for all q = and − .Proof. See Appendix E in [32].
Corollary 3.1.
Let f v ( τ ) = arg min f v L dualv ( f v , τ − | D v ) be the updated classifier of Algorithm 3 and let f v , y = q ( τ ) bea reference classifier such that ˆ J ( f v ( τ ) | y = q ) = ˆ J x | y = q ( τ ) for q = and − . If all the conditions of Theorem 3 aresatisfied, then f v ( τ ) satisfies P (cid:0) ˆ J ( f v ( τ ) | y = q ) ≤ ˆ J x | y = q ( τ ) + µ q + C dualv , y = q ( τ ) (cid:1) ≥ − δ q , (19)for all q = − Proof.
The inequality ˆ J ( f v ( τ ) | y = q ) − ˆ J ( f ∗ v ( τ ) | q ) ≤ C dualv , y = q ( τ ) holds for f v ( τ ) and f ∗ v ( τ ) , for q = −
1, and fromTheorem 3, P (cid:0) ˆ J ( f ∗ v ( τ ) | y = q ) ≤ ˆ J x | y = q ( τ ) + µ q (cid:1) ≥ − δ q , for q = −
1. Therefore, we can have (19).
In this section, we test the learning performance of Algorithm 3 and explore the tradeoff between security and privacy.We simulate the user and system activities, the communication activities in the AUs and OBUs of the VANET based onthe
NSL-KDD data, which is the refined version of its predecessor of
KDD’99 and solves some of the inherent problemsof the KDD’99 [44]. The NSL-KDD dataset contains essential records of the complete KDD dataset. Each record12
PREPRINT - M AY
5, 2020 (a) DVP: α v = .
01 (b) DVP: α v = . α v ( t ) ; DVP with ρ = − . , C = ρ = − , C = C v = C v = C v = C v =
1. Figure 5d:Receiver operating characteristic (ROC) curve for non-private, DVP with differentvalues of α v ; DVP with ρ = − . , C = ρ = − , C = contains 41 attributes indicating different features of flow with a label assigned either as an attack or normal. Due to thelack of public datasets for network-based IDSs, the NSL-KDD is currently the best available dataset for benchmarkingof different intrusion detection methods [17, 44].In the experiments, the task is to classifier whether a network activity is an attack (1) or normal ( −
1) using logisticregression . There are four types of attacks presented in NSL-KDD, namely, denial of service, probing, unauthorizedaccess to local system administrator privileges, and unauthorized access from a remote machine [45]. In this experiment,we only classify whether an activity is an attack or normal without identifying the specific type of the attack.We also propose an approach to select an optimal value of α v ( t ) that can manage the tradeoff between security andprivacy by introducing a utility function of privacy. In the experiments, we fix the value of α v ( t ) for each entire runningof Algorithm 3; thus, the noise of each vehicle v ∈ V generated at each running of DVP is i.i.d.To process the NSL-KDD dataset into a form suitable for the classification learning and satisfying the Assumption3, we process the NSL-KDD dataset according to the procedures suggested in [46]. The main processes include thetransformation of symbolic attributes to numeric values, feature selection that eliminates irrelevant, noisy or redundantfeatures, data normalization that helps speed up the learning.When the P-CML engine in vehicle v is initiated, collaboration is established over the VANET. As shown in Figure 1,the vehicle v ∈ V only communicates with vehicles in its one-hop neighborhood composed of three vehicles, a , b , and13 PREPRINT - M AY
5, 2020 c ∈ N v , which also communicate with neighboring vehicles directly or through the RSU (e.g., a and c ). Each vehicle inthe collaboration updates its own primal and dual parameters simultaneously. In the experiments, we test the DVP-based Algorithm 3 using logistic regression. Let L lr be the loss function of logisticregression, which has the form L lr ( y iv f T x iv ) = log ( + exp ( − y iv f Tv x iv )) . Clearly, the first and the second order derivatives of L lr can be bounded as | L (cid:48) lr | ≤ | L (cid:48)(cid:48) lr | ≤ , respectively.Therefore, the logistic regression satisfies the conditions in Assumption 1 with C = . In this paper, we use theregularization function R ( f v ) = (cid:107) f v (cid:107) . Then, we can directly use L = L lr in Theorem 1 to guarantee the DDP. In the first set of experiments, we test the convergence of Algorithm 3. The learning performance is measured by theempirical risk (ER). In this experiment, each entire running of the algorithm is based on a fixed value of α v ( t ) . Figure5a and 5b show the test results. As can be seen, larger α v ( t ) ’s lead to faster convergence; for α v = .
5, the convergedER is close to the ER of non-private learning (i.e., Algorithm 2).
Figure 6. Mobility of VANET: There are three main factors that can cause the changes of the VANET topology, namely, the inflowand the outflow of vehicles, the speed change, and the position change.
In this subsection, we explore the tradeoff between the required privacy of training data at each vehicle and the securityof the IDS using the classifier trained via the collaborative learning over the VANET. The privacy is quantified by thevalue of α v ( t ) . Basically, a larger α v ( t ) leads to a larger likelihood ratio Q ( f v ( t ) | D v ) / Q ( f v ( t ) | D (cid:48) v ) , which implies ahigher belief of the adversaries about the change of any single entry of the training dataset. Therefore, larger α v ( t ) ’syields lower privacy. However, the performance of the algorithm increases when the value of α v ( t ) grows; higherperformance leads to higher level of security. This decreasing monotonicity relationship shows a tradeoff betweensecurity and privacy.We propose an approach to determine an optimal value of α v ( t ) that can well manage the security-privacy tradeoff byconstructing the utility functions of security and privacy. The design of utility functions at every vehicle v ∈ V has tosatisfy the conditions stated in the following assumption: Assumption 6.
The utility of security is monotonically increasing in α v ( t ) while the utility of privacy is monotonicallydecreasing in α v ( t ) .We use the empirical loss (ER), J ( t ) = C n v ∑ n v i = L lr ( y iv f v ( t ) T x iv ) , to quantify the security (smaller J ( t ) is, higher thesecurity is). Let U sec ( · ) : R + → R denote utility of security that describes the relationship between J ( t ) and α v ( t ) . Thefunction U sec is determined by the experimental result, i.e., ( α v ( t ) , J ( t )) using curve fitting. Figure 5c verifies that14 PREPRINT - M AY
5, 2020 U sec (curve fit in green) monotonically decreases with respect to α v ( t ) thus the security and α v ( t ) has a monotonicdecreasing relationship.The utility of privacy is designed to meet specific requirements of privacy of each vehicle. Let U pri ( · ) : R + → R represent the utility of privacy. Beside the decreasing monotonicity, U pri ( · ) is also required to be convex and doublydifferentiable such that the optimal value of α v ( t ) can be obtained by solving a convex optimization problem. In thisexperiment, we give an example of utility of privacy defined as U pri ( α v ( t )) = C v · ln C v C v α v ( t ) + C v α v ( t ) , for α v ( t ) ≤
1, where C vi ∈ R ++ for i = , , ,
4. Then, the optimal value of α v ( t ) is determined by solving thefollowing problem at iteration t : min α v ( t ) Z ( t ) = U sec ( α v ( t )) − U pri ( α v ( t )) s . t . < α v ( t ) ≤ , ≤ U sec ( α v ( t )) ≤ U , (20)where U is the threshold value for U sec beyond which is considered as insecure.In the experiments, we use a few fixed values of ρ and calculate the empirical loss J ( t ) = C n v ∑ n v i = L lr ( y iv f v ( t ) T x iv ) ofthe classifier. The value of ρ that gives the minimum J for a fixed value of α v ( t ) (We use 0 . ρ = − and10 − . for Algorithm 2 and 3, respectively, and set C = L acc ( α v ( t )) = C · e − C α v ( t ) + C , where C j ∈ R + , for j =
5, 6, 7. In our experiment,we determine C = . C = C = min t { J ( t ) } Figure 5d shows the receiver operating characteristic (ROC) curvesof the outputs of Algorithm 2, and Algorithm 3 with different values of α v ( t ) . We can see that when α v ( t ) increases,the ROC of the output of the DVP is close to that of the non-private Algorithm 2. This also shows the tradeoff betweensecurity and privacy in terms of the ROC. This feature makes it possible to find an optimal value of α v ( t ) such thatAlgorithm 3 performs similar to Algorithm 2. We also investigate how the accuracy changes when we increase the number of vehicles in a VANET. In this experiment,we fix the number of ADMM iterations to 45 and examine three VANETs with 4, 8, and 16 vehicles, respectively, usingNSL-KDD dataset.Figure 7a shows the convergence results for different P . As can be seen, a larger VANET size converges to a smallervalue of empirical risk (ER). Figure 7b shows the tradeoff between security and privacy in terms of the ER. We can seethat larger P performs better in managing the tradeoff between security and privacy. Due to the high mobility of the vehicles, the topology of a VANET changes frequently. In this paper, we also examinethe impact of topology-varying VANET in the security and privacy. As shown in Figure 6, the changes of topologycan be caused by the inflow and the outflow of the vehicles, the changes of speed, the changes of the positions, orthe combinations of these. We focus on the activities of a specific vehicle v ∈ V . Let V T represent the number oftopology changes during the one collaborative learning. Let V ( i ) be the i -th topology for i = , ..., V T , and let k ( i ) bethe corresponding number of iterations spent at V ( i ) . Let P ( i ) be the number of vehicles at the topology V ( i ) . P ( i ) can be changed by the inflow and the outflow. V T can be changed by the changes of the speed and the position. Theincrease and the decrease of speed also change the values of k ( i ) . The collaborative learning scenario at vehicle v overthe topology-varying VANET can be described as follows. At V ( i ) , there are P ( i ) vehicles and v collaborates withits neighboring vehicles; after k ( i ) iterations, the VANET topology changes to V ( i + ) with P ( i + ) vehicles; after k ( i + ) iterations, the VANET topology changes to V ( i + ) , and so on. In practice, the changes of VANET topologyare very fast and complex. We simulate some varying-topology scenarios to study the impact of varying topologies atthe outputs of our algorithms. In the following two sets of experiments, we consider two cases of topology-varyingVANET. In Case 1, we fix the value of P ( i ) = P for i = , ..., V T , where P is a constant, while in Case 2, we test theresults when P ( i ) changes for different i .We conduct two experiments in Case 1. In the first experiment, we consider V T =
5, and fix P ( i ) = k ( i ) = k ,which is an integer, for all i = , ...., V T , and we vary the value of
V T . Figure 7c-7d show the results of convergence15
PREPRINT - M AY
5, 2020 (a) DVP: α v = .
5. (b) Tradeoff: empirical risk.(c) Convergence with α v = .
5. (d) Tradeoff: empirical riskFigure 7. Figure 7a-7b: convergence with different (fixed) values of α v ( t ) with fixed 45 ADMM iterations; DVP with ρ = − . , C = P = P =
8, and P =
16 vehicles. Figure 7c-7d:convergence and security-private tradeoff, respectively, with fixed P =
4, fixed the total number of iterations as 45, and differentvalues of
V T per collaborative learning; DVP with ρ = − . , C = and the security-privacy tradeoff for V T = V T =
3, and
V T =
5. From Figure 7c, we can see that the values of ERexperience jump when the VANET topology changes. Also, the larger value of
V T (more changes of VANET topology)gives smaller values of ER as the number of iterations increases, which is also reflected in Figure 7d. The tradeoffresults shown in Figure 7d indicate that a larger value of
V T , i.e., more frequent change of VANET topology, has abetter performance in managing the security-privacy tradeoff.In the second experiment in Case 1, we test the impact on the results when the speed of vehicle changes, which isquantified by the change of k ( i ) . We fix P ( i ) =
4, for all i = , ...., V T , and
V T =
3. Figure 8a shows that there is noobvious difference in the value of ER between the fixed speed and the increasing speed. Also, Figure 8b does not showan obvious difference between fixed speed and varying speed regarding managing the security-privacy tradeoff.In the experiment of Case 2, we fix
V T =
3, and k ( i ) =
15, for i = , ,
3. We test the results when the number ofvehicles P ( i ) increases as i increases. Figure 8c shows the results of convergence. From the convergence results, we cansee that a larger density of vehicles per topology has smaller values of ER; also the increasing density of vehicles pertopology has smaller values of ER than the fixed-density topology-varying VANET and the fixed-topology VANET.Figure 8d shows the behavior of the tradeoff between security and privacy. The results show that increasing P ( i ) (moreinflow of vehicles) outperforms the topology-varying with fixed P ( i ) and the fixed-topology case.16 PREPRINT - M AY
5, 2020 (a) Convergence with α v = . α v = .
5. (d) Tradeoff: empirical riskFigure 8. Figure 8a-8b: convergence and security-private tradeoff, respectively, with fixed P =
4, fixed total number of iterations as45, fixed
V T =
3; the result of
V T = k ( ) = k ( ) = k ( ) =
15; in the case ofincreasing speed, k ( ) = k ( ) = k ( ) =
5; DVP with ρ = − . , C = V T =
3, fixed k ( ) = k ( ) = k ( ) =
15; the result of
V T = P ( i ) = i = , ,
3, inflow and outflow neutralize each other; in the case of increasing P ( i ) , P ( ) = P ( ) = P ( ) =
16; DVP with ρ = − . , C = In this paper, we have described an architecture for a collaborative intrusion detection system using privacy-preservingdistributed machine learning. The privacy-preserving scheme for the distributed collaborative-based learning is essentialfor achieving a private collaboration; otherwise, the distributed machine learning itself creates privacy leakage of thetraining data. We have proposed a privacy-preserving machine-learning based collaborative intrusion detection system(PML-CIDS). The alternating direction method of multipliers (ADMM) approach is used to decentralize the empiricalrisk minimization (ERM) problem that models the collaborative learning into the distributed ERM well-suited to thenature of the VANET system.We have proposed the dynamic differential privacy and presented dual variable perturbation (DVP) to protect the privacyof the training data by perturbing the dual variable λ v ( t ) . We have also analyzed the theoretical performance of theDVP, which is measured by the minimum training data size required to train a classifier with low error. The tradeoffbetween security and privacy has been investigated through numerical experiments. The data used in the experiments isthe NSL-KDD dataset. We also have proposed a design principle to select an optimal value of the privacy parameter α v ( t ) by solving an optimization problem such that both the security and privacy are optimized. The experimentshave also studied the impact of the different VANET size, and the changing VANET topology during the collaborative17 PREPRINT - M AY
5, 2020learning. As future work, we intend to investigate the collaborative IDS with both supervised and unsupervised machinelearning and extend the dynamic differential privacy to different machine learning techniques. We also intend to studythe methods of fast incremental learning that can be used in the frequent updates of the machine-learning-based IDSs.
A Appendix
A.1 Architecture of Vehicular Ad Hoc Network
In this appendix, we describe these three main components of the VANET architecture [47, 48].
A.1.1 On Board Unit (OBU)
An OBU is a communication device equipped on vehicles for vehicle-to-vehicle and vehicle-to-infrastructure commu-nications through the dedicated short-range communication (DSRC). DSRC is based on IEEE 802.11a technologyamended for the low overhead operation to 802.11p [1]. The AUs in vehicles use OBUs to communicate with otherAUs in other vehicles, and the information exchange is done by OBUs over the ad hoc network. The functions andprocedures of an OBU contain wireless radio access, message transfer, geographical ad hoc routing, network congestioncontrol, data security, IP mobility support and others. The basic communication system of an OBU is composed ofa minimum set of safety applications, the communication protocol stack with communication transport and networklayer, the radio protocols with the IEEE 802.11p devices, and an interface to the local sensors mounted on vehicles.OBUs can also include additional network devices for non-safety applications using other radio technologies like IEEE802.11a/b/g/n [48].
A.1.2 Application Unit (AU)
An AU is a device mounted on the vehicle and operates the application installed that uses the communication capabilitiesof the OBU. Applications can be safety applications such as hazard warning, navigation with communication capabilities,or the Internet-based application such as the personal digital assistant (PDA) [48].
A.1.3 Roadside Unit (RSU)
An RSU is a device located at the fixed positions on the roadside, along highways, or at dedicated locations like parkingplace or gas stations. Each RSU is equipped with at least one network device for short-range wireless communicationsbased on IEEE 802.11p radio technology. It can also be equipped with other network devices, thereby enablingcommunications with the infrastructure network. The main functions of an RSU include [48]: • Redistributing information to an OBU, thereby extending the communication range of the ad hoc network. • Running safety applications such as virtual traffic sign, vehicle-to-infrastructure warnings like accidentwarning. • Enabling the OBUs to connect to the cloud and other infrastructures.
A.2 PML-CIDS Components
In this appendix, we will elaborate on each component of the PML-CIDS architecture.
A.2.1 Pre-processing Engine
The pre-processing engine collects and pre-processes real-time audit data flows from OBU and various applicationsin the AU. These data flows may include user and system activities in the AU and the access request behaviors fromoutside of the vehicle; it also can be the communication activities between different OBUs or between OBU and RSUs.The pre-processing includes the transformation of symbolic attributes to numerical values, features selection, and datanormalization to reduce the possible large variation between values.
A.2.2 Local Detection Engine
The local detection engine analyzes the audit data flows processed by the pre-processing engine by a classifier trainedvia the P-CML engine. If a specific activity is classified as an intrusion, then the local detection engine triggers thealarm. The user of the vehicle determines how often the classifier needs to be re-trained. If an update of the classifier isrequired, the local detection engine will initiate the P-CML engine.18
PREPRINT - M AY
5, 2020
A.2.3 P-CML Engine
When the P-CML engine is initiated, the ADMM-based private distributed machine learning operates over a temporarilyestablished VANET by collaborating with other vehicles and RSUs. Each vehicle stores its local labeled training dataset.The training dataset can be the historical intrusion activities that have been detected in each vehicle, the data collectedby putting sensors on the VANET (for example, getting TCP or Netflow [17]), or the data provided by the trustworthyparties like the Department of Transportation. The participation of RSUs in P-CML enables vehicles to connect tovehicles in distance and other infrastructures such as the cloud.The P-CML engine is composed of three components, namely, the collaborative communication (CC) engine, thedistributed local learning (DLL), and the privacy-preserving (PP) mechanism. The DLL engine updates its localADMM variables including the dual and the primal variables using its local training dataset and the primal variablestransmitted from neighboring vehicles at each ADMM iteration. The PP engine at each vehicle provides dynamicdifferential privacy to the VANET involved in the collaboration. At each ADMM iteration, the CC engine uses theOBU to exchange the intermediately updated parameters with the CC engines of the neighboring vehicles through thelow-latency communication [49] based on the DSRC.
Reference [1] A.-S. K. Pathan,
Security of self-organizing networks: MANET, WSN, WMN, VANET . CRC press, 2016.[2] W. Zhang, R. Rao, G. Cao, and G. Kesidis, “Secure routing in ad hoc networks and a related intrusion detection problem,” in
Military Communications Conference, 2003. MILCOM’03. 2003 IEEE , vol. 2, pp. 735–740, IEEE, 2003.[3] T. Anantvalee and J. Wu, “A survey on intrusion detection in mobile ad hoc networks,” in
Wireless Network Security ,pp. 159–180, Springer, 2007.[4] Q. Zhu, C. Fung, R. Boutaba, and T. Basar, “Guidex: A game-theoretic incentive-based mechanism for intrusion detectionnetworks,”
IEEE Journal on Selected Areas in Communications , vol. 30, no. 11, pp. 2220–2230, 2012.[5] C. J. Fung, Q. Zhu, R. Boutaba, and T. Ba¸sar, “Bayesian decision aggregation in collaborative intrusion detection networks,” in
Network Operations and Management Symposium (NOMS), 2010 IEEE , pp. 349–356, IEEE, 2010.[6] J. Raiyn et al. , “A survey of cyber attack detection strategies,”
International Journal of Security and Its Applications , vol. 8,no. 1, pp. 247–256, 2014.[7] N. Hoque, M. H. Bhuyan, R. C. Baishya, D. K. Bhattacharyya, and J. K. Kalita, “Network attacks: Taxonomy, tools andsystems,”
Journal of Network and Computer Applications , vol. 40, pp. 307–324, 2014.[8] S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein, et al. , “Distributed optimization and statistical learning via the alternatingdirection method of multipliers,”
Foundations and Trends R (cid:13) in Machine Learning , vol. 3, no. 1, pp. 1–122, 2011.[9] C. Dwork, F. McSherry, K. Nissim, and A. Smith, “Calibrating noise to sensitivity in private data analysis,” in Theory ofcryptography , pp. 265–284, Springer, 2006.[10] Y. Zhang, W. Lee, and Y.-A. Huang, “Intrusion detection techniques for mobile wireless networks,”
Wireless Networks , vol. 9,no. 5, pp. 545–556, 2003.[11] P. Albers, O. Camp, J.-M. Percher, B. Jouga, L. Me, and R. S. Puttini, “Security in ad hoc networks: a general intrusiondetection architecture enhancing trust based approaches.,” in
Wireless Information Systems , pp. 1–12, 2002.[12] D. Sterne, P. Balasubramanyam, D. Carman, B. Wilson, R. Talpade, C. Ko, R. Balupari, C.-Y. Tseng, and T. Bowen, “A generalcooperative intrusion detection architecture for manets,” in
Third IEEE International Workshop on Information Assurance(IWIA’05) , pp. 57–70, IEEE, 2005.[13] O. Kachirski and R. Guha, “Effective intrusion detection using multiple sensors in wireless ad hoc networks,” in
SystemSciences, 2003. Proceedings of the 36th Annual Hawaii International Conference on , pp. 8–pp, IEEE, 2003.[14] M. Blowers and J. Williams, “Machine learning applied to cyber operations,” in
Network Science and Cybersecurity , pp. 155–175, Springer, 2014.[15] S.-J. Horng, M.-Y. Su, Y.-H. Chen, T.-W. Kao, R.-J. Chen, J.-L. Lai, and C. D. Perkasa, “A novel intrusion detection systembased on hierarchical clustering and support vector machines,”
Expert systems with Applications , vol. 38, no. 1, pp. 306–313,2011.[16] Z. Muda, W. Yassin, M. Sulaiman, and N. Udzir, “Intrusion detection based on k-means clustering and naïve bayes classification,”in
Information Technology in Asia (CITA 11), 2011 7th International Conference on , pp. 1–6, IEEE, 2011.[17] A. L. Buczak and E. Guven, “A survey of data mining and machine learning methods for cyber security intrusion detection,”
IEEE Communications Surveys & Tutorials , vol. 18, no. 2, pp. 1153–1176, 2015.[18] C. Wagner, J. François, T. Engel, et al. , “Machine learning approach for ip-flow record anomaly detection,” in
InternationalConference on Research in Networking , pp. 28–39, Springer, 2011. PREPRINT - M AY
5, 2020 [19] C. Kruegel and T. Toth, “Using decision trees to improve signature-based intrusion detection,” in
International Workshop onRecent Advances in Intrusion Detection , pp. 173–191, Springer, 2003.[20] L. Bilge, E. Kirda, C. Kruegel, and M. Balduzzi, “Exposure: Finding malicious domains using passive dns analysis.,” in
NDSS ,2011.[21] L. Bilge, S. Sen, D. Balzarotti, E. Kirda, and C. Kruegel, “Exposure: a passive dns analysis service to detect and reportmalicious domains,”
ACM Transactions on Information and System Security (TISSEC) , vol. 16, no. 4, p. 14, 2014.[22] J. Cannady, “Artificial neural networks for misuse detection,” in
National information systems security conference , pp. 368–81,1998.[23] R. P. Lippmann and R. K. Cunningham, “Improving intrusion detection performance using keyword selection and neuralnetworks,”
Computer Networks , vol. 34, no. 4, pp. 597–603, 2000.[24] C. J. Fung and Q. Zhu, “Facid: A trust-based collaborative decision framework for intrusion detection networks,”
Ad HocNetworks , vol. 53, pp. 17–31, 2016.[25] Q. Zhu, C. J. Fung, R. Boutaba, and T. Basar, “A distributed sequential algorithm for collaborative intrusion detection networks,”in
Communications (ICC), 2010 IEEE International Conference on , pp. 1–6, IEEE, 2010.[26] Q. Zhu and T. Ba¸sar, “Dynamic policy-based ids configuration,” in
Decision and Control, 2009 held jointly with the 2009 28thChinese Control Conference. CDC/CCC 2009. Proceedings of the 48th IEEE Conference on , pp. 8600–8605, IEEE, 2009.[27] Q. Zhu and T. Ba¸sar, “Indices of power in optimal ids default configuration: Theory and examples,” in
International Conferenceon Decision and Game Theory for Security , pp. 7–21, Springer, 2011.[28] M. N. Mejri, N. Achir, and M. Hamdi, “A new security games based reaction algorithm against dos attacks in vanets,” in
Consumer Communications & Networking Conference (CCNC), 2016 13th IEEE Annual , pp. 837–840, IEEE, 2016.[29] S. P. Kasiviswanathan, H. K. Lee, K. Nissim, S. Raskhodnikova, and A. Smith, “What can we learn privately?,”
SIAM Journalon Computing , vol. 40, no. 3, pp. 793–826, 2011.[30] R. Bassily, A. Smith, and A. Thakurta, “Private empirical risk minimization: Efficient algorithms and tight error bounds,”pp. 464–473, 2014.[31] S. Han, U. Topcu, and G. J. Pappas, “Differentially private distributed constrained optimization,”
IEEE Transactions onAutomatic Control , pp. 1–1, 2014.[32] T. Zhang and Q. Zhu, “Dynamic differential privacy for admm-based distributed classification learning,”
IEEE Transactions onInformation Forensics and Security , vol. 12, no. 1, pp. 172–187, 2017.[33] T. Zhang and Q. Zhu, “A dual perturbation approach for differential private admm-based distributed empirical risk minimization,”in
Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security , pp. 129–137, ACM, 2016.[34] F. McSherry and K. Talwar, “Mechanism design via differential privacy,” in
Foundations of Computer Science, 2007. FOCS’07.48th Annual IEEE Symposium on , pp. 94–103, IEEE, 2007.[35] A. Blum, C. Dwork, F. McSherry, and K. Nissim, “Practical privacy: the sulq framework,” in
Proceedings of the twenty-fourthACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems , pp. 128–138, ACM, 2005.[36] F. Eigner and M. Maffei, “Differential privacy by typing in security protocols,” in
Computer Security Foundations Symposium(CSF), 2013 IEEE 26th , pp. 272–286, IEEE, 2013.[37] S. Han, U. Topcu, and G. J. Pappas, “Differentially private distributed constrained optimization,”
IEEE Transactions onAutomatic Control , vol. 62, no. 1, pp. 50–64, 2016.[38] M. T. Hale and M. Egerstedty, “Differentially private cloud-based multi-agent optimization with constraints,” in
AmericanControl Conference , pp. 1235–1240, 2015.[39] P. A. Forero, A. Cano, and G. B. Giannakis, “Consensus-based distributed support vector machines,”
The Journal of MachineLearning Research , vol. 11, pp. 1663–1707, 2010.[40] Y.-H. Dai, “A perfect example for the bfgs method,”
Mathematical Programming , vol. 138, no. 1-2, pp. 501–530, 2013.[41] T. Zhang and Q. Zhu, “Dynamic privacy for distributed machine learning over network,”
CoRR , vol. abs/1601.03466, 2016.[42] S. Shalev-Shwartz and N. Srebro, “Svm optimization: inverse dependence on training set size,” in
Proceedings of the 25thinternational conference on Machine learning , pp. 928–935, ACM, 2008.[43] K. Chaudhuri, C. Monteleoni, and A. D. Sarwate, “Differentially private empirical risk minimization,”
The Journal of MachineLearning Research , vol. 12, pp. 1069–1109, 2011.[44] M. Tavallaee, E. Bagheri, W. Lu, and A.-A. Ghorbani, “A detailed analysis of the kdd cup 99 data set,” in
Proceedings of theSecond IEEE Symposium on Computational Intelligence for Security and Defence Applications 2009 , 2009.[45] L. Dhanabal and D. S. Shantharajah, “A study on nsl-kdd dataset for intrusion detection system based on classificationalgorithms,”
International Journal of Advanced Research in Computer and Communication Engineering , vol. 4, no. 6, 2015.[46] H. Mohamad Tahir, W. Hasan, A. Md Said, N. H. Zakaria, N. Katuk, N. F. Kabir, M. H. Omar, O. Ghazali, and N. I. Yahya,“Hybrid machine learning technique for intrusion detection system,” 5th International Conference on Computing and Informatics(ICOCI) 2015, 2015. PREPRINT - M AY
5, 2020 [47] S. Al-Sultan, M. M. Al-Doori, A. H. Al-Bayatti, and H. Zedan, “A comprehensive survey on vehicular ad hoc network,”
Journalof network and computer applications , vol. 37, pp. 380–392, 2014.[48] R. Baldessari, B. Bödekker, M. Deegener, A. Festag, W. Franz, C. C. Kellum, T. Kosch, A. Kovacs, M. Lenardi, C. Menig, et al. , “Car-2-car communication consortium-manifesto,” 2007.[49] H. Hartenstein and L. Laberteaux, “A tutorial survey on vehicular ad hoc networks,”
IEEE Communications magazine , vol. 46,no. 6, 2008., vol. 46,no. 6, 2008.