Convolutional Neural Network-based Intrusion Detection System for AVTP Streams in Automotive Ethernet-based Networks
CConvolutional Neural Network-based IntrusionDetection System for AVTP Streams in AutomotiveEthernet-based Networks
Seonghoon Jeong a , Boosun Jeon b , Boheung Chung b , Huy Kang Kim a, ∗ a School of Cybersecurity, Korea University, Seoul 02841, Republic of Korea b Cyber Security Research Division, Electronics and Telecommunications Research Institute,Daejeon 34129, Republic of Korea
Abstract
Connected and autonomous vehicles (CAVs) are an innovative form of tradi-tional vehicles. Automotive Ethernet replaces the controller area network andFlexRay to support the large throughput required by high-definition applica-tions. As CAVs have numerous functions, they exhibit a large attack surfaceand an increased vulnerability to attacks. However, no previous studies havefocused on intrusion detection in automotive Ethernet-based networks. In thispaper, we present an intrusion detection method for detecting audio-video trans-port protocol (AVTP) stream injection attacks in automotive Ethernet-basednetworks. To the best of our knowledge, this is the first such method developedfor automotive Ethernet. The proposed intrusion detection model is based onfeature generation and a convolutional neural network (CNN). To evaluate ourintrusion detection system, we built a physical BroadR-Reach-based testbed andcaptured real AVTP packets. The experimental results show that the model ex-hibits outstanding performance: the F1-score and recall are greater than 0.9704and 0.9949, respectively. In terms of the inference time per input and the gen-eration intervals of AVTP traffic, our CNN model can readily be employed forreal-time detection. ∗ Corresponding author
Email addresses: [email protected] (Seonghoon Jeong), [email protected] (Boosun Jeon), [email protected] (Boheung Chung), [email protected] (Huy Kang Kim)
Preprint submitted to Vehicular Communications February 9, 2021 a r X i v : . [ c s . CR ] F e b eywords: Automotive Ethernet, In-Vehicle Network, Network Security,Replay Attack, Intrusion Detection System, Convolutional Neural Network
1. Introduction
Connected and autonomous vehicles (CAVs) are an innovative form of tradi-tional vehicle . A significant difference between traditional vehicles and CAVs isthe following: In CAVs, a physical (PHY) layer is used for the in-vehicle network(IVN), and automotive Ethernet replaces the controller area network (CAN)and FlexRay. As a result, CAVs support the large communication throughputrequired by high-definition applications such as video-on-demand, intelligenttransport systems, and advanced driver assistance systems (ADAS), which con-sume various sensor data and video streams [1]. The IEEE 1722 audio-videotransport protocol (AVTP) is the essential mechanism by which automotiveEthernet ensures the reliable transport of time-sensitive and prioritized traffic(e.g., audio and video streaming, as the name suggests). Moreover, the IEEE1722-2016 standard defines the AVTP for the transmission of time-sensitive con-trol streams (including CAN and FlexRay messages) via automotive Ethernet;the latter was defined by the IEEE P1722a working group [2]. Accordingly, weconsider that AVTP will be one of the most crucial protocols for IVNs in CAVs.Currently, cyber-attacks against vehicles are rare (even with a lack of protec-tion mechanisms) because the vehicles have inflexible applications and limitedprocessing resources that are only sufficient for their mobility [3]. However, theattack surface and feasibility of attacks are substantial for CAVs because oftheir additional functions. For example, Automotive Grade Linux or AndroidOS on an infotainment device could inherit vulnerabilities or malware inboundowing to the connectivity of CAVs [4]. Intrusion detection is a useful method toprepare for cyber-physical attacks on IVNs. Although there have been severalstudies on intrusion detection systems (IDSs) for IVN security [3, 5], intrusiondetection for systems employing automotive Ethernet has not been studied. Theinterruption of a media stream owing to intrusion raises not only a usability is-2ue but also security and safety issues in CAVs. Therefore, it is necessary todevelop an IDS to address the gap in intrusion detection research with regardto automotive Ethernet.Convolutional neural networks (CNNs) are artificial neural networks that areoften used in network traffic classification problems (e.g., classifying applicationprotocols or intrusion detection). CNNs are extensively used because they caneasily learn and predict one- or multidimensional data using a few parameters.In a CNN, a few sets of convolutional layers and pooling layers extract featuresfrom a given input. Finally, dense layers act as an estimator that returns apredicted class or a value using the extracted features. Thus, CNNs simplifysupervised learning without the need for much preprocessing of the input data.In this paper, we propose an intrusion detection method using a deep learn-ing model. Our method includes the feature generation process and a two-dimensional CNN (2DCNN) model. We designed the feature generator consid-ering the observed characteristics of real AVTP traffic. The detection modeldistinguishes whether AVTP packets transmitted over automotive Ethernet arebenign or injected on a packet-by-packet basis. Finally, we implemented an IDSusing the proposed method to evaluate the performance of our method.The main contributions of this study are as follows: • We introduce six possible attack scenarios for automotive Ethernet. Wealso demonstrate an AVTP replay attack using real audio-video bridging(AVB) devices. • We propose a novel intrusion detection method designed for automotiveEthernet. Our approach comprises the feature generator and the 2D-CNNmodel. To the best of our knowledge, this is the first attempt to solve theproblem of the security of automotive Ethernet-based IVNs. • We implement an IDS to evaluate the performance of our intrusion de-tection method by using a real dataset that we captured from a BroadR-Reach network. The evaluation results show that the IDS correctly classi-3es almost all AVTP packets, with very few false negatives. Furthermore,we confirm that our IDS is suitable for real-time detection. • We provide automotive Ethernet intrusion datasets used for our experi-ment in the PCAP format. In the datasets, each AVTP packet is labeledas “benign” or “injected.” Readers can access our datasets in [6] to repro-duce our experiment results, as well as use the datasets in further studies.The paper is organized as follows. Section 2 introduces the background ofautomotive Ethernet and AVTP. Section 3 provides plausible attack scenariostargeted to automotive Ethernet. Section 4 describes the results of our re-play attack performed on our real automotive Ethernet network as well as theproposed IDS design—including the feature generator and the artificial neuralnetwork used to classify AVTP packets. Section 5 shows the experiment resultson the real AVTP traffic. Section 6 discusses the AVTP intrusion dataset, somelimitations, and remediation strategies. Section 7 reviews previous studies onIDS designed for CAVs. Finally, Section 8 concludes the paper.
2. Preliminaries
The CAN protocol is widely adopted for IVNs of traditional vehicles: itsarbitration mechanism enables a simple bus topology, and its real-time messageprioritization considers the importance of each application. However, there aretwo drawbacks to CAN-based IVNs: (1) security issues and (2) limited band-width which is not suitable for various multimedia streams (such as cameradata and high-quality audio/video).
Traditional Ethernet was once considereda successor of the CAN bus for IVNs. Ethernet is commonly used in local-areanetworks and provides efficient, low-cost, and high-bandwidth communication.However, the best-effort transmission that Ethernet provides is insufficient forCAVs: an unexpected delay or further traffic loss can be highly detrimentalin vehicular applications. Thus, Ethernet protocols require some modifications4
MAC destination addressMAC source addressTPID (802.1Q = 0x8100)
PCP C F I VID (VLAN Identifier)EtherType (AVBTP = 0x22F0) C D subtype s v version m r r g v t v sequence num reserved t u stream id avtp timestampavtp timestamp gateway infogateway info stream data length (bytes)tag channel tcode sy 0 0 SID DBSFN QPC S P H Rsv DBC 1 0 FMT FDFPayload (Audio/Video Samples, Control Data, etc.) ❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤
Figure 1: Structure of stream AVTPDU — Values of
TPID , EtherType , and CD fields are setto 8100h, 22F0h, and 0b, respectively. in the PHY and the higher layers to enable their application to IVNs. Con-sequently, the concept of automotive Ethernet has emerged, and automotiveEthernet has been developed by standardization groups and industry experts. Automotive Ethernet has innovations in the PHY and higher layers. OpenAlliance’s special interest group designed and published BroadR-Reach technol-ogy by modifying the PHY employed in Ethernet to meet electrical requirementsfor automotive Ethernet. Subsequently, IEEE standardized 100BASE-T1, whichsucceeded BroadR-Reach with a few minor modifications. As a result, auto-motive Ethernet-based IVNs can be deployed with a twisted-pair cable; fur-5hermore, automotive Ethernet exhibits a high bandwidth of up to 100 Mbps.However, changes in the PHY do not guarantee low-latency, time-sensitive, andprioritized communication based on existing protocols.To guarantee the aforementioned features, Avnu Alliance implemented a setof technical standards named AVB. Notably, AVB enables the reliable transportof time-critical streams on automotive Ethernet. Specifically, four protocols—stream reservation protocol, generalized precision time protocol (gPTP), for-warding and queuing for time-sensitive streams, and AVTP—comprise AVBtechnology. The protocol headers are encapsulated in standard Ethernet head-ers when the packets are transmitted through automotive Ethernet. For exam-ple, an AVTP packet is encapsulated by the general MAC header as well asthe VLAN header. The VLAN header functions as the prioritization enabler forautomotive Ethernet; in addition, it enables network segmentation, as intended.With the support of these protocols, AVB-compatible devices become synchro-nized with each other and produce or consume streams regardless of topologicalcomplexity [7].Figure 1 shows the structure of a stream AVTP data unit (AVTPDU). AVTPis in charge of delivering the application payload. The payload can include audioand video streams, in addition to CAN or FlexRay messages for controlling in-vehicle electronic control units (ECUs). AVTPDU refers to an AVTP packet;AVTPDUs are categorized into control AVTPDU (value of the CD field is setas 1) and stream AVTPDU (value of the CD field is set as 0). AVB devicessend control AVTPDUs to discover themselves or establish/disconnect a streamsession. Meanwhile, in-vehicle switches learn the positions of AVTP listeners.After the session is established, the AVB talker transmits stream AVTPDUscontaining application data. Then, the in-vehicle switches multicast the streamAVTPDUs. Stream AVTPDUs always adhere to a VLAN tag to represent theirpriority among less time-critical packets. The channel field specifies the typeof protocol for the subsequent Payload field.6 n-Vehicle Network AdversaryAttack Surface
Intrusion Detector
AVB ListenerDirection of AVTP stream100BASE-T1 ECU ECU
Figure 2: Example of in-vehicle automotive Ethernet topology along with possible packetinjection attack scenarios. AVTP streams flow from the AVB talker to the AVB listeners.
3. Threats in automotive Ethernet
Several studies have evaluated threats to IVNs, especially with regard to theCAN bus. In contrast, there is a lack of research to identify threats to automo-tive Ethernet-based networks. In this section, we introduce threats targetingautomotive Ethernet-based IVNs. This section does not elucidate all possiblethreats to IVNs but emphasizes the need to identify attacks on automotive Eth-ernet. We assume that the attacker has access to a target IVN. Figure 2 showsan example topology of automotive Ethernet-based IVN in addition to possiblescenarios of how adversaries can invade the target IVN. Attackers can invadethe IVN through physical access using the OBD-II port and through remote ac-cess via compromised infotainment devices. Once an attacker gains such access,(s)he is capable of (1) passive attacks and (2) active attacks.An attacker carries out a passive attack by monitoring inbound traffic ona particular node monitored by him/her. The goals of a passive attack includeidentifying available services, vehicle status, and nature of in-vehicle communi-cations. For example, a passive attacker can identify the presence of a newlyestablished AVTP stream from inbound control AVTPDUs. A passive attackis difficult to detect, as it does not affect any data within IVNs. Fortunately,the impact of a passive attack is relatively limited (compared with an attack onthe CAN bus to which all nodes are connected). The reason is that in-vehicle7 able 1: Five active attacks on automotive Ethernet-based IVNs.
Attack Target Goal Likelihood Risk DiscriminabilityDoS attack ECU Out-of-communication sta-tus for a target ECU High High EasyCAM tableoverflow Switch Disable switching and broad-cast all legitimate trafficwithin IVNs High Low EasyFuzzing ECU Execute an unex-pected/hidden commandon the target ECU High High EasyCommandinjection ECU Execute a command as in-tended by the attacker Low High ModerateReplayattack ECU Re-execute instructions exe-cuted at the time of trafficextraction Middle High
Hard switches transmit only broadcast packets to the monitoring node, whereas otherpackets reach only their designated destinations.An active attack begins with a packet injection performed by the attacker.The attacker can inject arbitrary packets. Hence, injection attacks can be clas-sified into several types depending on the attacker’s purpose and the effects ofthe attack on IVNs. Here, we introduce five types of plausible active attacks(but further discovery and assessment are also needed): • Denial of service (DoS) attack.
An attacker may attempt a DoS at-tack in a specific ECU by flooding it with numerous VLAN-tagged packets.Automotive Ethernet is vulnerable to such an attack because it prioritizesnetwork traffic and ensures a certain quality of service using a 3-bit
PCP field in the VLAN header. Thus, the packets injected by an attacker canoverride other legitimate packets transmitted to the target ECU. Specifi-cally, the flooder initiates a burst transmission to the switch of the targetECU. This causes packet loss on the network level; other ECUs cannottransmit any packets to the target ECU during the DoS attack. However,the DoS attack does not affect other ECUs. Note that each stream AVT-PDU contains a VLAN header with a value of 3 in the
PCP field (depicted8n row 3 in Figure 1). Consequently, an AVB listener may not receive anAVTP stream owing to the DoS attack. • Content-addressable memory (CAM) table overflow.
A CAM at-tack exploits the MAC address learning process of in-vehicle switches.Specifically, the attacker floods the network with packets having a ran-dom source MAC address until the target switch cannot accept new MACaddress-port pair entries. This occurs when the CAM table of the tar-get switch is full. As a result, all inbound traffic is flooded to all ECUsconnected to the target switch. However, this threat exists not only inautomotive Ethernet switches but also in general switches [8]. Commer-cial switches can be equipped with countermeasures for this attack [9].Nevertheless, we should ensure that in-vehicle switches are robust againstCAM table overflow attacks, considering that this is a period of transitionfrom the use of the traditional CAN bus to automotive Ethernet. If anin-vehicle switch is vulnerable to the CAM table overflow attack, the im-pact of passive attacks will be very high because a single monitoring nodecan dump all transmitted packets. • Fuzzing.
In a fuzzing attack, random inputs are given to a target ECUin the form of a network packet. In this case, the attacker can scan activeservices, disrupt communications, and execute unexpected commands fora designated service. Furthermore, a fuzz packet can occasionally triggeran undocumented service mode (e.g., ECU firmware upgrade mode) orchange parameters that are crucial for vehicle operation. The black-boxfuzzing attack is easy to perform because it requires less effort to affect thetarget vehicle. Furthermore, a gray-box fuzzing attack conducted by anattacker with prior knowledge may cause a critical accident. Most of thetraffic generated by a fuzzer contains a malformed payload, which makessuch traffic easy to detect in an IVN implementation. • Command injection.
This attack is performed by an attacker whoknows the specification of a target vehicle in detail. The attacker injects9ell-crafted payloads to the IVN of the target vehicle. Then, the targetvehicle executes commands as assigned. The nature of command injec-tion depends on the protocol or application targeted by an attacker. Forexample, packet injection may take place over a varying time interval, de-pending on the type of attack. Rezvani [10] discussed the working of anautomotive Ethernet camera and demonstrated a command injection at-tack. He further exploited the end of image function of the target camerato override a video stream in accordance with his commands. To detect acommand injection attack, an IDS needs to recognize the context of thecommunication or know the signatures captured from the injected pack-ets. The risk of command injection attacks is very high because the targetvehicle executes commands as intended by the attacker. • Replay attack.
A replay attack enables an attacker to input a fewcommands even without prior knowledge of the attack target. To executea replay attack, the attacker must possess a pre-captured packet dump.By replaying the packet dump to the IVN, the attacker can make thetarget vehicle execute certain instructions (which were captured duringthe traffic extraction). With regard to an IDS, detecting a replay attackis straightforward because there will be two ongoing sessions during suchan attack, which is uncommon. However, it is challenging to determinewhich session or packets are being replayed by the attacker (because theinjected packets were also generated by legitimate ECUs).Table 1 summarizes five types of active attacks. We evaluate the likelihood , risk , and discriminability of the attacks based on the attack characteristics dis-cucsed previously. Here, we assign a high likelihood when no prior knowledge oreffort is required before the attack. We assign a high risk if the target ECU canbecome out-of-order or can execute an attacker’s command. Finally, discrim-inability indicates whether the injected packet is distinct from normal traffic(the packets can be easily detected if the injected packet is distinct). DoS at-tacks are relatively easy to detect because such attacks involve fixed payloads.10 witch IDS Feature generator ... ... ... ...Replay attack ( D
CAM table overflow and fuzzing attacks are also easy to detect because they in-volve the injection of randomly generated packets. The detection of a commandinjection attack requires appropriate knowledge; therefore, it involves moderatedifficulty. In contrast, replay attacks are very difficult to detect because thereplayed packets have already been used in the target vehicle. Therefore, it isnecessary to develop an IDS particularly for replay attacks.
4. System design
In this section, we propose an IDS designed to detect continuous replayattacks abusing stream AVTPDUs. In Section 4.2, we demonstrate an exampleof the replay attack on automotive Ethernet. Figure 3 depicts the proposedsystem design.
IDS deployment.
We assume that the IDS should monitor consecutive AVTPstream to detect intrusions. To this end, we describe how an IDS can be deployedin a CAV. Figure 2 shows that the IDS is connected to one of AVB listeners.Because AVTP supports one-to-many streams, IDS can observe AVTP streamsthrough an AVB listener without affecting other in-vehicle systems.
Before we develop an intrusion detection method for replay attacks on auto-motive Ethernet, we need to gather automotive Ethernet packets. To this end,we first implemented a testbed using a BroadR-Reach physical network to cap-ture AVB-related packets. We connected an AVB talker and an AVB listener11 a) Live stream fromthe AVB talker – “Must stop to preventa traffic accident.” (b) Injected streamfrom the adversary – “Looks good to go.” (c) Received footage at the AVB listener – “The vehicle may depart due to ADAS’s in-correct judgment based on the current inputstream.”
Figure 4: Demonstration of a replay attack (captured from our BroadR-Reach network).The terminal application plays a distorted video, as well as a little bit of the residual video;however, it mostly plays the scene intended by the attacker, as depicted in (c). Assumingthat an autonomous driving system is connected to the AVB listener, a traffic accident isimpending in this example scenario. to a BroadR-Reach switch. Then, we configured our BroadR-Reach switch toforward all of the inbound packets to a monitoring port. Hence, we accrued allof the communications between the AVB devices.Initially, the two devices synchronize through gPTP packets and send con-trol AVTPDUs to establish a new session. After the connection is established,the AVB talker transmits one-way stream AVTPDUs. The stream AVTPDUscontain a compressed MPEG-2 TS live video stream from a USB camera directlyconnected to the AVB talker. The AVB listener plays the video by combiningthe received packets. All such communication processes are stored as one packetdump. We collected several packet dumps for the experiment (the pre-capturedAVTP streams are shown in Figure 3).
We suppose that an attacker injects arbitrary stream AVTPDUs into theIVN. The goal of the attacker is to output a single video frame , at a terminalapplication connected to the AVB listener, by injecting previously generatedAVTPDUs during a certain period. To demonstrate the attack, we extract36 continuous stream AVTPDUs from one of our AVB datasets; the extracted12VTPDUs constitute one video frame . Then, the attacker performs a replayattack by sending the 36 stream AVTPDUs repeatedly.To illustrate the effect of the replay attack, Figure 4 shows the data col-lected from our testbed during this attack. As shown in Figure 4c, the adver-sary successfully compromises the video stream on the AVB listener side. Theterminal application plays corrupted video footage when the replay attack oc-curs (because the AVB listener receives stream AVTPDUs from two talkers).The attack can have a critical impact if the terminal application is the ADAS;a CAV has no choice but to be blinded or make a wrong decision. Therefore, itis important to detect injection attacks and replay attacks.
In this section, we present some observations of stream AVTPDU payloadsto explain their characteristics. We also elucidate the motivation for our designof the packet generator.Figure 5 shows the visualization of 100 continuous-stream AVTPDU pay-loads captured from our experiment testbed. Each pixel represents a byte value:light and dark cells indicate low and high values, respectively. Values are givenin the order of incoming AVTPDUs (from the top row to the bottom row). Eachstream AVTPDU that we captured has the same size (438 bytes).We found some interesting aspects in the first 58 bytes of the stream AVTP-DUs Figure 5b. First, each header of the stream AVTPDUs, shown in Figure 1,has a fixed value, except for the sequence num , DBC , and
Payload fields. Sec-ond, the values of the sequence num and
DBC fields sequentially increase by 01h and 10 h, respectively, and are set to 00 h when it overflows. Third, valuesin the offset range of 51–58 (the first eight bytes of the
Payload field) changeoccasionally. This pattern repeats every time a new video frame is received.In contrast, we found no meaningful pattern in the byte offset range of 59-438, although a light vertical line (offset 248) and a dark horizontal line wereobserved. The offset range of 59-438 is part of the MPEG-2 TS payloads carriedby the AVTPDUs. 13
50 100 150 200 250 300 350 400050 (a) Benign stream – 438 bytes of total payload (b) Benign stream – first 58 bytes (c) Injected stream – first 58 bytesFigure 5: Visualization of 100 continuous stream AVTPDU payloads. The vertical bars in (b)and in (c) represent whether the stream is benign (blue) or injected (red). tep 1. Receive stream AVTPDUs DDBB CC 00AA 41414141 DDBB CC 00AA
FFFFFFFF30 DDBB CC 00AA FFFFFFFF DDBB CC 00AA x
00 00 000 0 00 0 00 00
00 00 000 0 00 0 00 00
00 00 000 0 00 0 00 00 BE E B B E B Step 2.
Subtract byte-by-byte, then split into nibbles
Step 3.
Generate a feature X
00 000 00 000 00 00
00 00 B1 E
00 000
B EBE B E
00 0 000 0 000 00 00 0 00
00 00 00 00 000 0 00 0 00 00 Where window size w = 4
Figure 5c visualizes the payloads during a replay attack. The injected framescan be distinguished using the vertical line on the right. Conspicuous grid-likepatterns can be observed on the aforementioned fields; the patterns correspondto the alternation of normal and injected stream AVTPDUs. Such a patternmeans that the fields representing the sequence or the context in the payloadare fragmented. However, the replay attack may seem like a normal flow whenthe adversary dominates the network traffic (see near the 40th packet). Thismakes it difficult to detect replayed packets.There is no change in the Ethernet header or the VLAN header. Malformedstream AVTPDUs injected by a weak adversary should look conspicuous andawkward in the benign stream in automotive Ethernet-based networks. Forexample, we can easily detect a DoS attack based on a modified VLAN header.An IDS can quickly detect such injections with substantially less effort; anexample of this is rule-based intrusion detection that observes the modificationof static payloads.
Next, we design the feature generator so that the proposed CNN model candistinguish and classify the changes in the payload after a packet injection. Thefeature generator receives a stream AVTPDU from a BroadR-Reach networkand generates a feature to be used as an input of the CNN. Based on ourobservations, we decided to focus on the first 58 bytes of the i -th arriving streamAVTPDU. Moreover, the feature generator uses a window size w ∈ N | w ≥ w and then produces a two-dimensional vector from it. 15et x i be a one-dimensional vector containing j -bytes of i -th stream AVT-PDU, which is given as x i = ( x i, , x i, , . . . , x i, ( j − , x i, ) where x i,j ∈ N | ≤ x i,j < x i includes all fields and only the first 8 bytes of Payload field. We set j = 58 because we consider the first 58 bytes of eachstream AVTPDU payload. Considering two stream AVTPDUs x i − and x i ,let ∆ x i be the state change of the i -th stream AVTPDU, which is derived asfollows:∆ x i ≡ ( x i − x i − ) mod 2 ≡ ( x i, − x ( i − , , . . . , x i, − x ( i − , ) mod 2 ≡ (∆ x i, , ∆ x i, , ∆ x i, , . . . , ∆ x i, ( j − , ∆ x i, )=( u i, , v i, , u i, , v i, . . . , u i, , v i, ) (1)where ∆ x i,j = u i,j + v i,j and u i,j , v i,j ∈ N | ≤ u i,j , v i,j <
16. Finally, a w × j sized two-dimensional vector X i is defined as X i = ∆ x i − w +1 ∆ x i − w +2 ...∆ x i = u ( i − w +1) , v ( i − w +1) , · · · v ( i − w +1) , u ( i − w +2) , v ( i − w +2) , · · · v ( i − w +2) , ... ... . . . ... u i, v i, · · · v i, (2)This expression gives the window size corresponding to a recent streamchange.Figure 6 illustrates the process of feature generation. In summary, the fea-ture generator returns X i on the arrival of the i -th stream AVTPDU once thewindow is full (i.e., i > w ). The feature generator feeds each X i to the pro-posed CNN model. Y i ∈ { , } is employed to identify whether the i -th streamAVTPDU is benign or injected; this data is then used for training our CNNmodel. 16 able 2: Structure of the convolutional neural network and the output dimensions (where w = 44) Role Layer name Data shape Activation HyperparametersInput Input ( X i ) 44 × × × ×
32 ReLU filter size=5 × × ×
1, “same”padding, L2 reg.BatchNormalization 1 44 × ×
32 — momentum=0.99, ep-silon=0.001MaxPool 1 22 × ×
32 — pool size=2 × × ×
64 ReLU filter size=5 × × ×
1, “same”padding, L2 reg.BatchNormalization 2 22 × ×
64 — momentum=0.99, ep-silon=0.001MaxPool 2 11 × ×
64 — pool size=2 × Y (cid:48) i ) 1 Sigmoid — Our CNN model comprises an input layer, two hidden sub-layers, and twodense layers. Table 2 details the proposed CNN architecture and hyperparam-eters. Window size w must be a multiple of 4 owing to size reduction in twomax-pooling layers. To express the dimension of each layer, we use window size w = 44, which is the optimal value for our dataset (see Section 5.4). Thus, theinput layer accepts 44 × × batch normalization layers prevent overfit-ting and accelerate the training phase [11]. In the two Conv2D layers, we use L2regularization as the kernel regularizer, which limits the values of the weights.We deploy two dropout layers between fully connected layers, which also pre-vents overfitting [12]. The rectified linear unit (ReLU) activation function givesnon-linearity to the neural network and is widely used in CNN architectures to17earn complex features in input data.The last dense layer returns Y (cid:48) i through a sigmoid activation function, where Y (cid:48) i ∈ R | ≤ Y (cid:48) i ≤
1. We consider output Y (cid:48) i as the probability of X i beinginjected . In the training phase, the binary cross-entropy loss function compares Y (cid:48) i to Y i . Our CNN model is trained using the Adam optimizer [13], with alearning rate of 0.001.
5. Experiment
In the experiments, we use the Python library Keras [14] to implement our2D-CNN model. We train and evaluate our model using Google Colaboratoryon an NVIDIA Tesla P100 GPU.We performed four packet captures of attack-free stream AVTPDUs from ourBroadR-Reach network in the PCAP format for 70 minutes in total. The firsttwo packet captures are were performed indoors (in our laboratory) for 25 min,and the last two packet captures were performed for 45 min in a vehicle drivingon the road. We noticed that the packet transmission rates were different inboth environments. A possible reason is that the camera connected to the AVBtalker was capturing a more dynamic scene during driving. For instance, theaverage interval between packets is 3,157 µ s indoors and 1,735 µ s for a movingvehicle.Each packet capture corresponds to a single AVB session, including broad-cast messages for node discovery, establishment of an AVB session, video trans-mission over stream AVTPDUs, and disconnection of the AVB session. To demonstrate a replay attack, we continuously inject a video frame (the36 stream AVTPDUs discussed in Section 4.2) and recapture the four packets.Note that we waited for 120 seconds before the attack to obtain some datacorresponding to the attack-free scenario, which allowed our 2D-CNN model18o learn characteristics corresponding to the benign status as well. Then, weturned the four captured packets into AVTP intrusion datasets using the featuregenerator. All packets except for stream AVTPDUs were excluded during thefeature generation.We denote the AVTP intrusion datasets as D indoors and D driving . We selected D indoors as the training/validation set. D indoors was used to choose proper val-ues for w and for the hyperparameters of our 2D-CNN model. We used D driving as the test set. Note that the packet collection environments of both datasetsare independent. Thus, there can be no potential over-fitting caused by unrec-ognized payloads. D indoors has 446,372 benign X i s and 196,894 injected X i s. D driving has 1,494,257 benign X i s and 376,236 injected X i s. We use Accuracy, Precision, and Recall as performance metrics. The
TruePositive (TP) and
True Negative (TN) are a number of AVTPDUs correctlyclassified as benign and injected, respectively. On the other hand, the
FalsePositive (FP) and
False Negative (FN) are a number of AVTPDUs incorrectlyclassified as benign and injected, respectively. Then, the performance metricsare defined as:We use accuracy, precision, and recall as performance metrics.
True positive(TP) and true negatives (TN) are the numbers of AVTPDUs correctly classifiedas benign and injected, respectively.
False positives (FP) and false negatives(FN) are the numbers of AVTPDUs incorrectly classified as benign and injected,respectively. Thus, the performance metrics are defined as follows:Accuracy =(TP + TN) / (TP + TN + FP + FN)Precision =TP / (TP + FP)Recall =TP / (TP + FN) (3)Moreover, the F1-score is the harmonic mean of precision and recall, whichis sometimes more useful than accuracy, especially when the class distribution19s imbalanced. The F1-score is calculated as follows:F1 - score = 2 × Precision × Recall / (Precision + Recall) (4)We also used the receiver operating characteristic (ROC) curve and the areaunder the ROC curve (AUC) to demonstrate the accuracy of our model withregard to the classification of X i s. The ROC curve is a common method tomeasure the performance of binary classifiers. The ROC curve shows changesin the true-positive rate (TPR) and false-positive rate (FPR) according to thethreshold change for Y (cid:48) i . The TPR and FPR are calculated as follows:TPR =Recall = TP / (TP + FN)FPR =FP / (TN + FP) (5) To find an optimal w , we created 20 D indoors sets for w = (4 , , , ..., , X i s to the trainingset and 20% of X i s to the test set for each D indoors .Table 3 presents the performance of the model with respect to w . The bestresult in each column is highlighted. When a smaller window size is employed,the training requires less time. However, the classification performance is poorfor small window sizes. Conversely, one might conclude that a large w willbe advantageous because a wide window allows the 2D-CNN model to capturemore information from the input data. However, the largest w in our experimentdoes not bring the best result. Instead, the best performance was obtained for w = 44 for the test set and w = 60 for the training set. Therefore, we chose 44as the optimal window size because the model is more robust to the test set. We performed a five-fold cross-validation on D indoors to ensure that our 2D-CNN model is sufficient to detect intrusions and that the hyperparameters areproperly determined. In each cross-validation, 80% and 20% of the samples are20 able 3: Model performance on our dataset depending on window size w . The best outcome ineach column is highlighted. The result shows that the optimal value of w is 44 or 60 (insteadof the highest value). w Training time( µ s/sample) Trainloss Trainaccuracy Testaccuracy TestF1-score4
48 216 0.0574 0.9871 0.9931 0.993152 228 0.0513 0.9893 0.9939 0.993956 238 0.0551 0.9884 0.9868 0.986860 267 able 4: Classification results of five-fold cross-validation using dataset D indoors Fold Accuracy Precision Recall F1-score AUC1 0.9958 0.9916 0.9947 0.9932 0.99982 0.9967 0.9937 0.9958 0.9947 0.99993 0.9945 0.9895 0.9927 0.9911 0.99974 0.9946 0.9888 0.9937 0.9912 0.99985 0.9958 0.9931 0.9932 0.9932 0.9998Total 0.9955 0.9913 0.9940 0.9927 0.9997randomly selected as the training and validation sets, respectively. Each inputis used once for validation. In each cross-validation, the training set and thevalidation set have the same proportion of benign/injected labels. After thecross-validation, we obtain five trained 2D-CNN models.We tuned the batch size to 64 and epoch to 30 for training our model. Toevaluate the final performance of our method, we performed five experimentsusing each model on D driving .Table 4 shows the classification results obtained for the five-fold cross-validation.As our dataset is slightly imbalanced, we should focus on the F1-score (althoughthe accuracy is better). The experimental results show that the model exhib-ited outstanding performance, with F1-scores from 0.9911 to 0.9947. Thus, themodel can classify almost all stream AVTPDUs correctly. The high recall im-plies that the proposed model is very sensitive with regard to the detection ofalmost all injected stream AVTPDUs, even if the attacker sends AVTP traf-fic previously generated in the target automotive Ethernet-based network. Welisted the validation results obtained for each cross-validation and created the“total” row in Table 4. In conclusion, we achieved a satisfactory performanceof 0.99 or higher for all evaluation indicators for D indoors .Notably, there is no significant difference in performance metrics among thefive cross-validations. Hence, the training process is stable with the chosen22 able 5: Test results using dataset D driving Model Accuracy Precision Recall F1-score AUC1 0.9953 0.9821 0.9949 0.9885 0.99962 0.9930 0.9677 0.9990 0.9831 0.99923 0.9877 0.9439 0.9984 0.9704 0.99794 0.9923 0.9650 0.9984 0.9814 0.99965 0.9913 0.9600 0.9988 0.9790 0.9989 T r u e P o s i t i v e R a t e False Positive RateModel 1 (AUC=0.9996)Model 2 (AUC=0.9992)Model 3 (AUC=0.9979)Model 4 (AUC=0.9996)Model 5 (AUC=0.9989)
Figure 7: ROC curve and AUC scores using test set D driving hyperparameters. Moreover, our 2D-CNN model converges after the trainingphase regardless of input data. In other words, the experimental results obtainedusing D indoors demonstrate that our feature generator, 2D-CNN model, andhyperparameters can effectively detect AVTP replay attacks.Table 5 shows the test results obtained using D driving . We performed fivetests using the five models that were trained during the cross-validation. Wefound that the overall performance was slightly reduced compared to that ob-served for the cross-validation test. This is mostly due to a decrease in preci-sion, meaning that there are considerable FP cases in D driving . Interestingly,23 a) True negatives (b) False positives(c) False negatives (d) True positivesFigure 8: Visualization of four randomly selected samples ( X i s). Two vertical lines representlabels (left, Y i − w +1 , ..., Y i ) and predicted labels (right, Y (cid:48) i − w +1 , ..., Y (cid:48) i ). Blue and red dotsmean that the traffic is (predicted as) “benign” and “injected,” respectively. we confirmed that the recall had increased compared with that obtained for thevalidation result. Thus, we can conclude that our intrusion detection model(1) well distinguishes replayed stream AVTPDUs that are generally difficult toidentify; however, (2) it may entail false-alarm fatigue during the attack-freeperiod.Figure 7 shows that the AUCs are high although there was some performancereduction when the test set was employed. The ideal ROC curves and the highAUCs indicate that the 2D-CNN model well separates the input into benign andinjected classes. It means that we can reduce the FP cases, thereby improvingthe overall performance by adjusting the threshold for evaluating Y (cid:48) i .To illustrate the input data and the classification results, we visualize fourrandomly selected X i s from our dataset in Figure 8. Each dot expresses theextent of the state change of a nibble in the same position for adjacent streamAVTPDUs using brightness. The rightmost vertical bar represents Y i − w +1 , ..., Y i video frames because there are threehorizontal patterns in the MPEG2-TS header.Figure 8(d) shows the attacker dominating the AVTP stream. Except for thefirst injected packet, all injected packets appear to form a normal stream. Wecan observe that our 2D-CNN model successfully infers all intrusions correctly.When a dominant injection continues for a substantially long period, e.g.,when more than half of the window size corresponds to injection, the injectedpackets are sometimes misclassified (see the bottom of Figure 8(c)) becausethe features appear similar to those in the attack-free state. This is because thereplay attack was conducted using the traffic generated in the target automotiveEthernet. Nevertheless, our model correctly identifies most of the continuouspacket injections.One of the weaknesses of our model, found in Figure 8(b), is that a singlemisclassification causes short-term continuous errors. In the visualization, wecan observe five continuous classification errors. Table 6: Average inference time per sample for various devices
Host Processor Time( µ s/sample) Google Colab (GPU) NVIDIA Tesla P100 83Macintosh (CPU) Intel Core i7-7700K 787Jetson TX2 (GPU) NVIDIA Pascal (256 CUDAs) 982Raspberry Pi 3 (CPU) ARM Cortex-A53 35,000From a practical perspective, we need to consider not only the classificationperformance but also the inference time to determine whether the model is25 ediaconverterAVB stream fromBroadR-Reach networkIDS-sensor interfacein promisc. modeRemote access via Wi-Fi(configuration, detection result)256-core NVIDIA Pascal™ GPU
Figure 9: Implementation of the real-time IDS designed to identify injected stream AVTPDUsin a BroadR-Reach network suitable for real-time detection in CAVs. To this end, we measured the averageinference time per sample using four devices. Table 6 shows the results. Forreal-time detection, the inference time per sample must be less than or equal tothe packet occurrence time that we noted earlier in Section 5.1, i.e., 1,735 µ s.We considered the threshold to be 1,000 µ s, considering packet injections by anattacker.In Google Colaboratory, our model takes only 83 µ s to infer one sample withGPU acceleration, which is substantially shorter than the average packet inter-val. However, this result, obtained using a cloud artificial intelligence platform,does not reflect the situation in actual CAVs, because in-vehicle IDSs may nothave such a powerful GPU. Therefore, we measured the inference time using aPC once again. Although the detection time increased substantially, it can beseen that real-time detection is possible using a CPU.26e further conducted the experiment on two embedded computers. Weused a common device, Raspberry Pi 3, to show that the proposed model canfunction in a low-power computing environment. We successfully tested ourapproach on Raspberry Pi 3. However, the inference required a long time;hence, it was unsuitable for real-time detection. Next, we implemented an IDSfor automotive Ethernet using an embedded GPU and a BroadR-Reach mediaconverter. Figure 9 shows the implemented IDS connected to our testbed viaBroadR-Reach. We used an NVIDIA Jetson TX2 and an Intrepid RAD-Moonmedia converter. The proposed model requires approximately 982 µ s to processone sample in this case. According to these results, our CNN model is suitablefor real-time detection on GPU-enabled embedded computers.In general, computation complexity and classification accuracy are in atrade-off relationship. However, it is required to reduce computation and in-ference time if the in-vehicle system provides less computational performanceor an IDS runs on an embedded system along with multiple applications, suchas perception, decision making, and on-board diagnostics. To reduce inferencetime, model optimization needs to be done with resizing input size, model com-pression, or quantization. In Section 5.4, we confirmed that the smaller win-dow size we use, the shorter the model’s training time. For instance, changingwindow size from 44 to 28 decreases 25% of training time and increases smallmisclassification error (about 0.26% of test F1-score), which is endurable. Thereduction in learning time applies equally to inference time. To compress orquantize the proposed model, additional hyperparameters need to be carefullyconsidered. This requires a detailed study and is left for future work.
6. Discussion
During the experiment, we carefully created the automotive Ethernet intru-sion datasets instead of conducting online learning. This helps us reproduce theexperimental results and also lets other researchers study their own intrusion27etection methods for automotive Ethernet. The datasets are recorded in thePCAP file format and, therefore, can be viewed using prevalent programminglibraries and packet analyzers (such as Wireshark). As previously discussed, areplay attack was used as the intrusion technique. In this case, analysts with-out prior knowledge cannot distinguish the states of the packets as benign orinjected. To address this, we have labelled each stream AVTPDU so that otherresearchers can design sophisticated experiments.As part of our previous report, we released a CAN intrusion dataset [15].Interestingly, we confirmed that many studies had been conducted using thepublished dataset. We hope that our intrusion dataset will trigger various stud-ies on automotive Ethernet. Readers can refer to [6] to access our automotiveEthernet intrusion datasets.
The proposed IDS must receive all stream AVTPDUs for intrusion detectioncontinuously. As our IDS is designed to detect intrusions through continuousstream changes, intermittent traffic reception will cause numerous false posi-tives. To solve this problem, we expect the implemented IDS to act as an AVBlistener (see the AVB listener connected to the IDS in Figure 2) because streamAVTPDUs are delivered to all participating AVB listeners.Through the experiment, we confirmed that our model shows outstandingclassification performance. However, the pretrained model exhibits a slight per-formance degradation when the packet capture environment is changed (from D indoors to D driving ). This is a common problem observed in deep learning mod-els. Future studies can address this in two ways. First, a deep learning modelshould be designed to be more robust than the proposed model. A commonmethod for achieving this is deploying additional layers in a model; however,this entails a trade-off relationship with the classification speed. Second, a trans-fer learning technique can be considered to improve classification performancein a new environment. In this way, the pretrained intrusion detection modelcan be further trained using new inputs (but not trained from scratch).28he proposed intrusion detection method needs additional hardware forreal-time inference. The experiment result implies that our intrusion detec-tion method requires GPU acceleration or a high-end CPU to detect intrusionsin real-time. Real-time detection was not possible on small embedded comput-ers such as Raspberry Pi 3, although our 2D-CNN model consisted of relativelyfew layers. Fortunately, real-time intrusion detection can be performed using alow-power embedded GPU. In this study, intrusion detection was performed under the assumption thatan attacker has access to an automotive Ethernet-based IVN. It is importantto detect intrusions to protect IVNs. However, it is also essential to make itdifficult for an attack to occur and to enable a quick response to an identifiedattack. In brief, we suggest the following remediation strategies:1. Efforts are needed to reduce the attack surface. A firewall should beinstalled on known attack surfaces to prevent arbitrary packet injections.In addition, security checks should be performed on nodes that may besubject to external attacks (such as infotainment devices).2. Encryption and authentication protocols should be employed so that pas-sive attackers do not easily understand the internal communication pro-cess. For example, MACSec ensures the confidentiality and integrity ofcommunication over automotive Ethernet [16].3. It is possible to prevent bandwidth occupancy (i.e., DoS attacks) by us-ing the stream reservation protocol. This protocol is designed to ensurequality-of-service by reserving and increasing the bandwidth for time-sensitive A/V streams on automotive Ethernet. The reserved bandwidthcan prevent communication failures due to high traffic.4. Finally, it is necessary to develop intrusion response systems to block anattack as soon as it is identified. 29 . Related work
Corbett et al. [17] presented a pioneering study for Ethernet-based vehiclenetworks by analyzing Ethernet protocols designed for vehicle diagnostics, com-munication patterns, state-of-the-art tools, and attack scenarios. They also pro-posed a testing framework for an automotive IDS involving parameters, metrics,and automotive-specific challenges. However, their work only provides conceptsregarding framework architecture and does not provide results of realistic ex-periments with an implemented IDS. Consequently, their testing tool needs tobe evaluated.Corbett et al. [18] also conducted an early study that discusses securitymechanisms for automotive Ethernet-based IVNs. The authors identified pos-sible manipulation and misuse attacks on automotive Ethernet-based networks,followed by an analysis of network topology and participating devices. Theprotocols and operating systems designed for vehicles were also introduced.Readers interested in the current security trends, threats, and intrusion de-tection methods of IVNs are encouraged to refer to state-of-the-art reviews [3, 5]and the references therein. The reviews also describe the constraints, challenges,and characteristics of IDSs for IVNs. This information will be of great help infurther research on vehicle intrusion detection methods. However, these reviewslack citations of Ethernet- and automotive Ethernet-specific studies.Some research was conducted on network packet classification using deeplearning techniques. Kim and Anpalagan [19] adopted a 1D-CNN model to clas-sify encrypted Tor traffic using only the TCP/IP header. Their data processorextracts the first 54 bytes of each payload and then split bytes into nibbles.The nibbles are used as the inputs of the 1D-CNN model. The experimentalresults show that this model not only performs binary classification well butcan also infer specific application types. Wang et al. [20] converted packets to28 ×
28 sized grey images to classify malware traffic using a CNN model. ThisCNN model has the advantages of early-stage detection, low false-alarm rate,protocol-independency, automatic extraction of features, and raw traffic data30nput. On a CAN-based IVN system, Song et al. [21] adopted a deep CNNmodel–Inception-ResNet–to classify four types of injection attacks within theCAN bus. Their frame builder is designed to represent a sequence of CAN IDsat a bit level. Taylor et al. [22] also conducted a study to detect CAN busattacks using long short-term memory (LSTM) networks.Changing the nature of the packets and environment degrades performancewhen a pretrained deep learning model is used for detection. For instance, thisdegradation is apparent from Table 5. Moreover, new types of attacks are notproperly detected by existing models. Tariq et al. [23] applied transfer learningto identify new types of intrusion attacks. They trained a convolutional LSTMnetwork using a DoS attack dataset captured from the CAN bus. Moreover,they tried one-shot learning using only a few fuzzing and replay attacks.
8. Conclusion
In this paper, we proposed an intrusion detection method that detects AVTPstream injection attacks in automotive Ethernet. To the best of our knowledge,this is the first intrusion detection method for protecting automotive Ethernet-based IVNs. Our approach includes feature generation and a CNN-based in-trusion detection model. We design a feature generator that measures statechanges of the AVTP stream at the nibble-level within a specific time win-dow. The CNN model extracts features from the input and infers whether astream AVTPDU is benign or injected. The experimental results show thatour model achieved a high detection performance and remarkably high recallfor real stream AVTPDUs captured from our BroadR-Reach-based testbed. Weprove that our model is suitable for real-time detection in an actual CAV bymeasuring the time required for inference. Although we limited our experimentsto stream AVTPDUs, we will also consider other AVB-related protocols and au-tomotive diagnostics communication protocols in future work. Finally, we havereleased the dataset collected for this study. We look forward to further studieson automotive Ethernet security based on our AVTP stream dataset.31 cknowledgements
This work was supported by Institute for Information & communicationsTechnology Promotion(IITP) grant funded by the Korea government(MSIT)(No.2018-0-00312, Developing technologies to predict, detect, respond, and au-tomatically diagnose security threats to automotive Ethernet-based vehicle).
References [1] P. Hank, S. Muller, O. Vermesan, J. Van Den Keybus, AutomotiveEthernet: In-vehicle Networking and Smart Mobility, in: Design, Au-tomation & Test in Europe Conference & Exhibition (DATE), 2013,IEEE Conference Publications, New Jersey, 2013, pp. 1735–1739. doi:10.7873/DATE.2013.349 .URL http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6513795 [2] IEEE P1722 working group, IEEE standard for a transport protocol fortime-sensitive applications in bridged local area networks (2016). doi:IEEESTD.2016.7782716 .URL https://doi.org/10.1109/IEEESTD.2016.7782716 [3] G. Loukas, E. Karapistoli, E. Panaousis, P. Sarigiannidis, A. Bezemskij,T. Vuong, A taxonomy and survey of cyber-physical intrusion detectionapproaches for vehicles, Ad Hoc Networks 84 (2019) 124–147. doi:10.1016/j.adhoc.2018.10.002 .URL https://doi.org/10.1016/j.adhoc.2018.10.002 [4] F. Panarotto, A. Cortesi, P. Ferrara, A. K. Mandal, F. Spoto, Static Anal-ysis of Android Apps Interaction with Automotive CAN, in: nternationalConference on Smart Computing and Communication (SmartCom), 2018,pp. 114–123. doi:10.1007/978-3-030-05755-8{\_}12 .URL http://link.springer.com/10.1007/978-3-030-05755-8_12 doi:10.1109/TITS.2019.2908074 .URL https://ieeexplore.ieee.org/document/8688625/ [6] S. Jeong, B. Jeon, B. Jung, H. K. Kim, Automotive eth-ernet intrusion datasets, https://github.com/seonghoony/autoeth-intrusion-dataset (2020).[7] B. Metcalfe, C. M. Kozierok, C. Correa, R. B. Boatright, J. Quesnelle,M. Holden, K. Irving, Automotive Ethernet, Intrepid Control Systems,2014.[8] S. A. J. Alabady, Design and Implementation of a Network Security Modelusing Static VLAN and AAA Server, in: 2008 3rd International Conferenceon Information and Communication Technologies: From Theory to Appli-cations, IEEE, 2008, pp. 1–6. doi:10.1109/ICTTA.2008.4530276 .URL http://ieeexplore.ieee.org/document/4530276/ [9] E. Vyncke, Layer 2 security, (2006).[10] D. Rezvani, Hacking automotive ethernet cameras, https://argus-sec.com/hacking-automotive-ethernet-cameras/ (2018).[11] S. Ioffe, C. Szegedy, Batch normalization: Accelerating deep network train-ing by reducing internal covariate shift, in: International Conference onInternational Conference on Machine Learning, 2015, pp. 448–456.URL http://jmlr.org/proceedings/papers/v37/ioffe15.pdf [12] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, R. Salakhutdinov,Dropout: A simple way to prevent neural networks from overfitting, Journal33f Machine Learning Research 15 (2014) 1929–1958.URL http://jmlr.org/papers/v15/srivastava14a.html [13] D. P. Kingma, J. Ba, Adam: A Method for Stochastic Optimization, arXivPreprint (12 2014).URL http://arxiv.org/abs/1412.6980 [14] F. Chollet, et al., Keras, https://keras.io (2015).[15] H. Lee, S. H. Jeong, H. K. Kim, OTIDS: A Novel Intrusion DetectionSystem for In-vehicle Network by Using Remote Frame, in: 2017 15thAnnual Conference on Privacy, Security and Trust (PST), Vol. 5, IEEE,2017, pp. 57–5709. doi:10.1109/PST.2017.00017 .URL https://ieeexplore.ieee.org/document/8476919/ [16] B. Carnevale, L. Fanucci, S. Bisase, H. Hunjan, MACsec-Based Security forAutomotive Ethernet Backbones, Journal of Circuits, Systems and Com-puters 27 (5) (2018) 1–17. doi:10.1142/S0218126618500822 .[17] C. Corbett, T. Basic, T. Lukaseder, F. Kargl, A testing framework archi-tecture concept for automotive intrusion detection systems, Lecture Notesin Informatics (LNI), Proceedings - Series of the Gesellschaft fur Informatik(GI) P-269 (2017) 89–102.[18] C. Corbett, E. Schoch, F. Kargl, P. Felix, Automotive ethernet: Securityopportunity or challenge?, in: Lecture Notes in Informatics (LIN), 2016,pp. 45–54.URL [19] M. Kim, A. Anpalagan, Tor Traffic Classification from Raw Packet Headerusing Convolutional Neural Network, in: 2018 1st IEEE InternationalConference on Knowledge Innovation and Invention (ICKII), no. January,IEEE, 2018, pp. 187–190. doi:10.1109/ICKII.2018.8569113 .URL https://ieeexplore.ieee.org/document/8569113/ doi:10.1109/ICOIN.2017.7899588 .URL http://ieeexplore.ieee.org/document/7899588/ [21] H. M. Song, J. Woo, H. K. Kim, In-vehicle network intrusion detectionusing deep convolutional neural network, Vehicular Communications 21(2020) 100198. doi:10.1016/j.vehcom.2019.100198 .URL https://doi.org/10.1016/j.vehcom.2019.100198 [22] A. Taylor, S. Leblanc, N. Japkowicz, Anomaly Detection in AutomobileControl Network Data with Long Short-Term Memory Networks, in: 2016IEEE International Conference on Data Science and Advanced Analytics(DSAA), IEEE, 2016, pp. 130–139. doi:10.1109/DSAA.2016.20 .URL http://ieeexplore.ieee.org/document/7796898/ [23] S. Tariq, S. Lee, S. S. Woo, CANTransfer - Transfer Learning basedIntrusion Detection on a Controller Area Network using ConvolutionalLSTM Network, in: Proceedings of the 35th Annual ACM Symposiumon Applied Computing, ACM, New York, NY, USA, 2020, pp. 1048–1055. doi:10.1145/3341105.3373868 .URL https://dl.acm.org/doi/10.1145/3341105.3373868https://dl.acm.org/doi/10.1145/3341105.3373868