[PDF] An Adaptive Multi-Agent Physical Layer Security Framework for Cognitive Cyber-Physical Systems

Abstract

Being capable of sensing and behavioral adaptation in line with their changing environments, cognitive cyber-physical systems (CCPSs) are the new form of applications in future wireless networks. With the advancement of the machine learning algorithms, the transmission scheme providing the best performance can be utilized to sustain a reliable network of CCPS agents equipped with self-decision mechanisms, where the interactions between each agent are modeled in terms of service quality, security, and cost dimensions. In this work, first, we provide network utility as a reliability metric, which is a weighted sum of the individual utility values of the CCPS agents. The individual utilities are calculated by mixing the quality of service (QoS), security, and cost dimensions with the proportions determined by the individualized user requirements. By changing the proportions, the CCPS network can be tuned for different applications of next-generation wireless networks. Then, we propose a secure transmission policy selection (STPS) mechanism that maximizes the network utility by using the Markov-decision process (MDP). In STPS, the CCPS network jointly selects the best performing physical layer security policy and the parameters of the selected secure transmission policy to adapt to the changing environmental effects. The proposed STPS is realized by reinforcement learning (RL), considering its real-time decision mechanism where agents can decide automatically the best utility providing policy in an altering environment.

Full PDF

11 An Adaptive Multi-Agent Physical Layer SecurityFramework for Cognitive Cyber-Physical Systems

Mehmet Özgün Demir, Ozan Alp Topal, Ali Emre Pusane, Guido Dartmann,Gerd Ascheid, and Güne¸s Karabulut Kurt

Abstract —Being capable of sensing and behavioral adapta-tion in line with their changing environments, cognitive cyber-physical systems (CCPSs) are the new form of applications infuture wireless networks. With the advancement of the machinelearning algorithms, the transmission scheme providing the bestperformance can be utilized to sustain a reliable network ofCCPS agents equipped with self-decision mechanisms, wherethe interactions between each agent are modeled in terms ofservice quality, security, and cost dimensions. In this work,ﬁrst, we provide network utility as a reliability metric, whichis a weighted sum of the individual utility values of the CCPSagents. The individual utilities are calculated by mixing thequality of service (QoS), security, and cost dimensions with theproportions determined by the individualized user requirements.By changing the proportions, the CCPS network can be tunedfor different applications of next-generation wireless networks.Then, we propose a secure transmission policy selection (STPS)mechanism that maximizes the network utility by using theMarkov-decision process (MDP). In STPS, the CCPS networkjointly selects the best performing physical layer security policyand the parameters of the selected secure transmission policy toadapt to the changing environmental effects. The proposed STPSis realized by reinforcement learning (RL), considering its real-time decision mechanism where agents can decide automaticallythe best utility providing policy in an altering environment.

Index Terms —Cyber-physical systems, physical layer security,quality of service, risk-aware control, utility.

I. I

NTRODUCTION

Cyber-physical systems (CPS) can a be seen as specialform of a distributed system with integrated closed loopsfor control task among multiple agents, as shown in Fig. 1.In this paper, we consider a special type of CPS where allagents communicate their sates and control information overwireless channels. In this ﬁgure, an agent is deﬁned as anindividual unit that has a control center, multiple sensors,and actuators. In real-life deployments, a system consists ofmultiple agents, that interact with each other, such as trafﬁcsigns on the road and units inside the vehicles in vehicle-to-everything (V2X), or industrial networks. Typical CCPShave the ability to locally (edge) analyze the environmentand make decisions to a limited extent. In this article, we

M. Ö. Demir and A. E. Pusane are with the Bo˘gaziçi University, Istanbul,Turkey, {ozgun.demir1, ali.pusane}@boun.edu.tr.O. A. Topal and G. Karabulut Kurt are with the Istanbul TechnicalUniversity, Istanbul, Turkey, {ozan.topal,gkurt}@itu.edu.tr.G. Dartmann is with the University of Applied Sciences Trier, Trier,Germany, [email protected]. Ascheid is with the RWTH Aachen University, Aachen, Germany,[email protected] work is supported by TUBITAK under Grant 115E827.

User / DeviceActuatorsSensors

Expected to have:- Desired Security- Sufﬁcient QoS- Low Cost

ActuatorsControlCenter ControlCenter

PLS policy

User Commands:- User Preferences- Requirements- Desired Security Level

Agent

Sensors

Active / Passive AttackerSensor Cognition ChannelPhysical Interaction betweensensors/actuators and user/deviceAgents, which consist of controlcenter, actuators, and sensors Physical World App.Layer PHYLayerPassive Cyber AttacksActive Cyber Attacks

Cognition

User Commands PLS Policy information channel

PLS policy

Fig. 1: System model of a CCPS that consists of multipleagents in layered structure with possible attacks.consider agents that utilize reinforcement-learning (RL), wherethey can analyze the environment, take action in real time.An essential aspect of the environment here is the physicalwireless communication channel, which is also essential formonitoring and controlling the security of the communication.On the one hand the communication channel is inﬂuenced bythe interaction of the agents and on the other hand the physicalproperties of the communication channel also inﬂuence theactions and behavior of the agents. Hence, the operation ofeach agent and the coordination between agents depend on thewireless communication infrastructure. However, a wirelessnetwork cannot guarantee data security, since radio signalscan be intercepted or manipulated by malicious devices [1],which may be active or passive attackers, e.g., eavesdroppers,as shown in Fig. 1.In many distributed systems, the environment is also chang-ing continuously, (e.g., additional agents, updated attackermodels, increasing noise, and interference). Therefore, CPS a r X i v : . [ ee ss . S Y ] J a n also have to adapt to changing environmental conditions toprovide continuous security as well as the requirements ofthe target application. In the near future, it is expected thatCPS will have cognitive abilities to sense the dynamicallychanging environments and make decisions based on theupdated conditions [2]. This critical evolution of CPS wouldbe plausible if the agents of CCPS can adapt their operationalpreferences, e.g., transmission technology, transmit power, andsecurity policy, to simultaneously provide the desired systemperformance and security [3].Numerous security mechanisms are proposed in the physicallayer security (PLS) literature, where the body of work hasunraveled different security opportunities at the physical layerconsidering various set-ups and attack scenarios [4]. Themodus operandi in this body of work is maximizing thesecurity performance of a single PLS policy by optimizingthe power-frequency-antenna resources under a time invariantchannel and attack model. Considering a CCPS frameworkcomposed of many agents equipped with various resources,utilizing a single security mechanism with a constant purposewould create a bottleneck for the security and the reliability ofthe CCPS networks. Instead of focusing on a single speciﬁcsecurity scenario, as given in our previous work [2], we extendour perspective, and try to answer two large-picture questionsfor the secure CCPS framework: "What is the best availablesecurity policy and conﬁguration for the CCPS agents undersensed channel conditions?" and "Besides security, how canwe measure the physical layer performance of a CCPS net-work?" As an answer to the ﬁrst question, we propose a decisionmechanism, where the best performing PLS policy with bestperforming transmission conﬁguration is aimed to be selectedfrom the available set of security policies, as shown in Fig.2. More speciﬁcally, we deﬁne a control center that is mainlycollecting channel data and deciding the best strategy for theagents. The units are designed speciﬁcally for sensing, adapt-ing, and conﬁguration located in the control center, where theycollect information about ambient risks and environmentalchanges from sensors, estimating the risks, and communicationresources. They also conﬁgure the physical layer parametersof the entire system based on agent preferences. Sensors andactuators are then informed by the control center to updatetransmission parameters. Since the various PLS policies arealready available in the literature, the selective nature of thealgorithm provides the adaptability for the CCPS network.As an answer to the second question, we deﬁne a user-centric utility as the main performance metric. The utilityis the weighted average of the security, quality of service(QoS), and cost dimensions of the CCPS performance. Thesedimensions include various system performance metrics afterclassifying into three main dimensions, as shown in Fig. 2. Theweights of each dimension are determined by the requirementsimposed by users. Since there are distinct operational require-ments of the main applications of CPS, e.g., drone swarms,ultra-reliable low latency communication (URRLC), massivemachine type communication (mMTC), smart grid, vehicle-to-everything (V2X) communication, and health networks [2],[5], we can tune the weights to perform best under the

Agent 1 Agent 2 Agent I UtilitySecurity Quality of Service Cost ... - Secrecy Capacity- Privacy - SINR - Power Consumption- Delay - Bandwidth- ... - ... - ... Network Utility Agent 1Agent 2 Agent I ConﬁgurationPolicy 1 Policy N ...

Secure Transmission Policy Selection

TransmissionConﬁg. 1 ... TransmissionConﬁg. M TransmissionConﬁg. 1 ... TransmissionConﬁg. M ...

Fig. 2: Proposed secure transmission policy selection anddesigned utility metric with its sub-dimensions for each agent.sensed channel conditions and available resources. However,separately selecting the best performing policy for each agentin the network may not provide the best performing networkresults due to the interference resulted from other agents andthe required operational costs. Instead of using individualutilities for each agent in the network, in this work, we proposemixing the utilities of each agent with correct proportions toobtain a single network utility value that symbolizes the totalperformance of the CCPS network. To obtain the networkutility, we consider two different operational structures ofCCPS networks. In the ﬁrst structure, the agents calculate theirutilities and select the best performing PLS policy individuallyand locally. In the second structure, agents are connected to acentral control unit, where their network utility is calculatedby averaging their individual utilities. Our contributions inthis work can be listed as below. • We propose an adaptive PLS policy selection and trans-mission conﬁguration mechanism, which is called securetransmission policy selection (STPS), for a multi-agentphysical layer security framework using an Markov deci-sion process (MDP). The proposed mechanism is inher-ently adaptable, where it may maximize the individualrewards or a joint reward respectively considering selﬁshand cooperating agents. • The reward of the individual agents is calculated by autility function, which is a weighted sum of the QoS,security, and cost dimensions. By multiplying the util-ity function with time-dependent characteristics, such asdegrading battery, we propose a more applicable reward function considering the CCPS frameworks. • We simulate the proposed mechanism with an RL-basedsetup considering different beyond 5G applications andenvironment characteristics. Simulation results of the pro-posed mechanism are compared with the only security-based and only QoS-based policy selection mechanismsto demonstrate the trade-offs between QoS, security, andcost dimensions.Overall, we provide an adaptive CCPS framework that isable to make the best STPS for each agent to maximize itsutility, which is deﬁned as the weighted sum of QoS, secu-rity, and cost dimensions, based on time-dependent changingenvironment and application-based requirements.

A. Related Works

The PLS policies and their performance criteria are providedin [4]. One way to secure the physical layer transmissionis a resource allocation at the transmitter node [6]. OtherPLS policies are obtained by the different signal processingmethods, such as beamforming and precoding [7], [8]. Sincethe message is pre-coded in line with the receiver’s channel,the performance of any other node at a different position woulddecrease [9]. Antenna-selection or node-selection based PLSpolicies are other types of physical layer defense mechanism[10]. The joint utilization of different PLS policies is previ-ously considered in [10], and [11]. In [10], the authors pro-pose antenna selection with beamforming in MIMO networks.However, these works consider utilizing the same PLS policiesin changing conditions. As a difference, we consider that aset of PLS policies are available for the CCPS framework.Similar to our previous work [2], this methodology can becategorized under a novel PLS policy selection umbrella inthe PLS literature.Another difference from the aforementioned works is thathere, we consider multi-agent security, where the total util-ity is aimed to be maximized. Similar to our study, in[12], the authors propose a game-theoretic physical layersecurity mechanism, where a model-based predictive controlsystem is utilized to connect the cyber and physical layersof the autonomous systems. In [13], the authors considernon-orthogonal-multiple-access (NOMA) to establish a securewiretap model and optimize the user power allocation co-efﬁcients to maximize the secrecy. In [14], the security ofCCPS agents is established by the optimal power alloca-tion and power splitting at the secondary transmitter undersecrecy constraints. In [15], the medium access probabilityand transmission power of secondary transmitters are jointlyoptimized to maximize the security of all network nodes. In[16], the authors propose a beamforming design for a two-way cognitive radio (CR) Internet of Things (IoT) networkaided with the simultaneous wireless information and powertransfer (SWIPT). While these works consider the performanceof a single method under stable channel characteristics, ourwork provides an adaptable joint power and PLS policyselection in changing environmental conditions, addressing thecorresponding gap in the literature.

B. Organization

In the following section, we detail the calculation of utilitydimensions from the physical layer parameters. In Section III,we provide the MDP-based STPS mechanism and the pro-posed utility function. In Section IV, we present the simulationparameters and results of the proposed mechanism. In SectionV, the paper is concluded.II. C

ALCULATING THE DIMENSIONS OF UTILITY

In our system model, we assume that the CCPS networkconsists of I agents, where i denotes the index of an agent.Due to different requirements and environmental conditions,each agent may have different sets of available PLS policiesand transmission conﬁgurations, which are denoted as P i and R i for the i th agent, respectively. These sets can be stated as P i = { P i , P i , . . . P N i i } and R i = { R i , R i , . . . , R M i i } , where N i is the number of available PLS policies and M i is the num-ber of available transmission conﬁgurations for the i th agent.When we consider the transmission time slot of a frame, t isthe time slot index, where t = 1 , , . . . T . Due to the impact ofincreasing time slots, we have to distinguish which STPS is de-cided at time slot t from the sets of P i and R i for each agent,where i = 1 , , . . . , I . Under these circumstances, K and L are the sets of chosen PLS policies and transmission conﬁgu-rations by agents at t , where K = { P k [ t ] , P k [ t ] , . . . , P k I I [ t ] } and L = { R l [ t ] , R l [ t ] , . . . , R l I I [ t ] } . In these statements, k i and l i are indices of the chosen PLS policies and transmissionconﬁgurations by each agent and these terms should satisfyfollowing conditions k i ≤ N i , l i ≤ M i , ∀ i ∈ { , , . . . I } . (1)In Fig. 3, the block diagram of the proposed system modelis given. In this paper, we assume that the agents are in-dependent and each of the agents consists of a transmitterand a receiver unit. We will denote the transmitter and thereceiver of the i th agent respectively as a i and b i . Thenumber of antennas of the transmitter node of the i th agentis denoted as W i . We assume that each receiver node isequipped with a single antenna. The eavesdropper is alsoassumed to be equipped with a single antenna. The channelfading coefﬁcient vector between the nodes c i and d j isexpressed by h c i d j = (cid:2) h c i d j (1) , h c i d j (2) , · · · , h c i d j ( W i ) (cid:3) T ,and h c i d j ( ω ) ∼ CN (0 , D cidj ) , where c ∈ { a, b } , d ∈ { a, b } , i, j ∈ { , , . . . , I } and i (cid:54) = j . p T denotes the transpose ofthe vector p . We consider three different PLS methodologies:subcarrier-based artiﬁcial noise (SC-AN), full-duplex artiﬁcialinterference (FD-AI), and artiﬁcial noise (AN) with supportedbeamforming. We also consider the case of only beamforming(B), hence the agents may utilize four different PLS policies.The signal models for each of the PLS policies are detailedin the link . For fairness, we consider that the messagetransmission power, R s , from all agents are equal. The signal The operations to obtain dimensions and utilityis detailed in https://github.com/ozanalpt/PLS-toolbox-for-CCPS/The_Guidebook_of_the_PLS_toolbox_for_CCPS.pdf

Agent 1Agent 2Agent I . . . Calculationof the UtilityDimensions User / Applications w [t] T PLS Policy N xN x...xN I TransmissionConﬁguration M xM x...xM I PLS Policy 1 PLS Policy 2Transmission Conﬁg. 1 Transmission Conﬁg. 2

Current State (t th Time Slot)

Secure Transmission Policy Selection ... ...

SINR ,W SINR ,W SINR I ,W I s [t] , q [t] , c [t] s [t] , q [t] , c [t] s I [t] , q I [t] , c I [t] Updating the utility dimension based on the selectionsfor the next state k ,l ,k ,l ...,k I ,l I (t+1) th Time SlotU [t] U [t] U I [t] γ [t] T Updating weights

H, BSensoryInformation

Fig. 3: The block diagram of the utility calculation with the proposed MDP-based STPS.to interference and noise ratio (SINR) at node b i in the t th time slot is expressed bySINR bi [ t ] = | v Hi h a i ,b i | R s γ k i i (Υ i [ t ] + N i ) , (2)where Υ i denotes the interference from other agents, N i denotes the noise variance at the receiver node of the i th agent,and γ k i i denotes the interference cancellation error coefﬁcientof the selected PLS policy. The transmit beamforming vectorfor the i th agent is denoted by v i = h a i ,b i . The interferencefrom other agents can be divided into parts as Υ i [ t ] = (cid:88) j (cid:54) = i (Υ sji [ t ] + Υ cji [ t ]) , (3)where Υ ci denotes the interference resulted from the messagesignal transmitted by other agents, and Υ ai denotes the in-terference resulted from the security signal by other agents.The interference from the message signal transmission can beexpressed by Υ sji [ t ] = | v Hj h a j b i | R s . (4)The security interference results from the security signal trans-mitted from other agents, where Υ cji [ t ] denotes the interferencefrom the security signal of j th agent. This term differs basedon the selected PLS policy. Similarly, considering the SC-ANpolicy, Υ cji [ t ] becomes Υ c,SC − ANji [ t ] = | h a j b i | R l j j [ t ] , (5)where R l j j [ t ] denotes the selected transmission conﬁgurationof the j th agent. Note that, γ k i i = 2 for the SC-AN policy.For other policies, γ k i i = 1 . Considering FD-AI policy, Υ cji [ t ] becomes Υ c,F D − AIji [ t ] = | h b j b i | R l j j [ t ] . (6)Considering the AN policy, Υ cji [ t ] becomes Υ c,ANji [ t ] = | α a j b j · h a j b i | R l j j [ t ] , (7) where α a j b j = N S ( h ajbj ) || h ajbj || and N S ( ζ ) denotes the null-spaceof the vector ζ . Considering the only beamforming policy, Υ c,Bji [ t ] = 0 . (8)Similarly, SINR for the link from the transmitter of the i th agent to the eavesdropper can be expressed bySINR ei [ t ] = | v Hi h a i e | R s (Υ ie [ t ] + N e ) . (9)In this case, Υ ie [ t ] becomes Υ ie [ t ] = (cid:88) j (cid:54) = i Υ sje [ t ] + I (cid:88) i =1 Υ cie [ t ] , (10)where Υ sje [ t ] = | v j h a j e | R s . (11)In addition to the interference resulting from the security signalof other agents, the security signal from the considered agentalso decreases the performance of the eavesdropper.In the following, we detail how physical layer parametersare converted into the dimensions of the individual utilityfunction. A. Security

As a security metric, we utilize secrecy pressure as proposedin [17]. The main difference between other secrecy metricsis that the location of the eavesdropper is assumed to beunknown. In this way, secrecy pressure shows the securitylevel of the considered medium, instead of focusing on theexact security levels. As a ﬁrst step, the ergodic secrecycapacity of each point ( x, y ) of the surface S is obtained. Thisstep provides us with a secrecy map of the surface S for thecorresponding security policy. After that, expectation operationis applied over the surface S . The location of the eavesdropperis assumed to be a 2D Gaussian random variable as the surfacemight contain different physical measures in different areas.For example, considering a factory environment, the mean of the Gaussian distribution may be assumed to be the blind spotof the security cameras. The position of the eavesdropper isunknown; therefore, it will be denoted by ( x, y ) . The secrecycapacity for the i th agent can be represented by C isec ( x, y ) = max { , ( C iB − C iE ( x, y )) , (12)where C iB and C iE ( x, y ) are respectively the channel capacitiesof the link of the i th agent’s transmitter and receiver, and thelink of i th agent’s transmitter and eavesdropper. The capacityof the i th agent’s channel can be given as C iB = 12 log (cid:16) SINR bi (cid:17) , (13)and eavesdropper’s channel capacity is C iE ( x, y ) = 12 log (cid:0) SINR ie ( x, y ) (cid:1) . (14)Note that the capacity of a generic ( x, y ) point is given sincethe location of the eavesdropper is unknown. Since the secrecycapacity depends on the mutually independent, independentand identically distributed (i.i.d.) channel fading coefﬁcients,the ergodic secrecy capacity can be given as ˜ C isec ( x, y ) = E (cid:2) C isec ( x, y ) (cid:3) , (15)where E {·} denotes the expectation operator. Note that theergodic secrecy capacity is also deﬁned for a generic ( x, y ) point. Calculating ergodic secrecy capacity for each point onthe surface S would result in calculating the secrecy map ofthe surface.Considering this perspective, we can weigh the ergodic se-crecy capacity for each ( x, y ) generic point by the probabilityof the eavesdropper’s existence at that point. Then, the ergodicsecrecy pressure of the surface S of the time slot t can be givenby Ω i [ t ] = (cid:90) (cid:90) S γ ( x, y ) ˜ C sec ( x, y ) dxdy, (16)where γ ( x, y ) is the probability density function (pdf) of thepresence of the eavesdropper at a point ( x, y ) on the surface.In this case, the security of i th agent can be obtained by s i [ t ] = Ω i [ t ]max j ∈{ , ,...,I } [Ω j [ t ]] . (17) B. Quality of Service

QoS indicates the general performance of the communi-cation link. In the considered setup, we consider the SINRlevels at the receiver nodes of each agent as their QoS metric.As described in the security calculation, the SINR levelswith respect to each policy and transmission conﬁguration arecalculated. Then, they are normalized by the maximum SINRlevel of the determined STPS. Hence, QoS of the i th agent canbe described by q i [ t ] = SINR bi [ t ]max j ∈{ , ,...,I } [ SINR bj [ t ]] . (18) C. Cost

Cost is a measure of the limited resources considering theCCPS framework. The number of active antennas, the selectedlevel of the transmission power, the selected coding schemeare the main elements determining the battery life and the usedmemory of the CCPS agents. In this work, we consider thecost of the i th agent as a weighted sum of the utilized numberof antennas and selected transmit power conﬁgurations as c i [ t ] = W i [ t ] + R l i i [ t ]max j ∈{ , ,...,I } [ W j [ t ] + R l j j [ t ]] . (19)As described in the previous section, the weighted sum ofsecurity, QoS, and cost dimensions determine the utility of theagent. The processes explained in this part are conceptualizedin [2]. Even though this methodology effectively illustrates thegeneral performance of a single-agent, it becomes inefﬁcientfor the multi-agent systems, where limited resources, such asfrequency bandwidth, battery life, are shared by all agents.Therefore, in the following section, ﬁrst we obtain a novelperformance metric, named as network utility, which modelsthe performance of the CCPS framework considering thejoint utilization of the shared resources. Then, we detailthe proposed STPS approach for the considered multi-agentframework.III. A DAPTIVE M ULTI -L AYER S ECURITY F RAMEWORKFOR M ULTI -A GENT S YSTEMS

A. Network Utility Calculation

During the deployment of an adaptive CCPS frameworkfor multi-agent systems, a proper utility calculation should bedone before each transmission, as shown in Fig. 3. Deﬁningutility is not a straightforward problem; however, it can becalculated as a weighted sum of security, QoS, and costterms. The calculation of these terms is highly related to theinteractions of the agents inside the network, PLS policies,and a proper performance metric. The details about thesecalculations will be explained in detail in the next section.In short, there are normalized values of security, QoS, andcost, which are denoted as s i [ t ] , q i [ t ] , c i [ t ] .The importance of these dimensions is fundamentally basedon the chosen application of CCPS. However, the roles of theagent may also be distinct from one another; therefore, theweights of the dimensions for each agent may signiﬁcantlyvary. Another factor on the weights is the impact of the activecommunication time on the weights of each dimension. Dueto the rise of power consumption and the reduction of QoSlevels of the agents, we try to avoid retransmissions for anincreasing number of successful transmission of packets in aframe. To deploy this scheme, we slightly adjust the weightsby increasing the weights of QoS and cost dimensions whiledecreasing the weight of the security dimension for after eachsuccessful communication time slot. In these circumstances,we can state a vector of the weights with the impacts of theincreasing number of time slots as w i [ t ] = (cid:2) γ s [ t ] γ q [ t ] γ c [ t ] (cid:3) T ◦ (cid:2) w i,s [ t ] w i,q [ t ] w i,q [ t ] (cid:3) T , (20) LastState

SC-AN,SC-AN SC-AN,FD-AI SC-AN,B SC-AN,AN FD-AI,SC-AN FD-AI,AN SC-AN,AN B,AN AN,SC-AN AN,AN ...... ... ......... ( d B , d B ) , ( d B , d B ) ( d B , d B ) , ( d B , d B ) ( d B , d B ) , ( d B , B ) ( d B , d B ) , ( d B , d B ) ( d B , d B ) , ( d B , d B ) ( d B , d B ) , ( d B , d B ) ( d B , d B ) , ( d B , d B ) ( d B , d B ) , ( d B , d B ) ( d B , d B ) , ( d B , d B ) ( d B , d B ) , ( d B , d B ) ANBFD-AISC-AN 10dB10dB10dB5dB5dB10dB5dB5dB - - - -- - - - LastState (a) (b)

Fig. 4: MDP-based PLS policy selection and transmission conﬁguration schemes: (a) individual decision, (b) joint decisionwhere w i,s [ t ] , w i,q [ t ] , and w i,c [ t ] are the security, QoS, andcost-related weights at t for the i th agent, and ◦ is theHadamard product operator. These weights should satisfy w i,s [ t ] + w i,q [ t ] + w i,c [ t ] = 1 for ∀ i ≤ I and ∀ t ≤ T . In(20), γ s [ t ] , γ q [ t ] , and γ c [ t ] are the impacts of the increasingnumber of time slots during the transmission of a frame.We also assume that there are negative effects on systemperformance when a different PLS policy is chosen in con-secutive time slots. In case of different policies are selectedsequentially, the agents may have to turn on/off some of theirhardware. This procedure may result in additional delays, andcost; therefore, we also consider these impacts by deﬁningtransition values of PLS policies from the policy k i to k (cid:48) i ,which denotes the next possible values of k i . The transitionvalue is chosen as δ i [ t ] k i → k (cid:48) i = 1 , where k i = k (cid:48) i , on the otherhand the condition δ i [ t ] k i → k (cid:48) i < is satisﬁed, where k i (cid:54) = k (cid:48) i .A similar formula can be generated for the transmissionconﬁguration adjustments of the agents, but we omit theseimpacts of transmission conﬁguration adjustments to simplifythe model.When we combine dimensions of the utility, weights of eachdimension, the impacts of increasing time slots, and transitionsto different PLS policies, we can express utility as U k i ,l i i [ t ] = δ i [ t ] k i → k (cid:48) i · (cid:32) w i [ t ] T ×  s i [ t ] q i [ t ](1 − c i [ t ])  (cid:33) , (21)In this utility expression, the cost related term is substracted,since increased cost leads to a smaller utility. The expression(21) stands at the center of the CCPS framework, and it willcommonly be utilized in the rest of the paper. It should alsobe remembered that the values of utilities are between and due to the normalization of the weights and utility dimensions. B. MDP-based Secure Transmission Policy Selection

In order to deploy a ﬂexible decision mechanism, an MDP-based Q-learning algorithm is developed, as shown in Fig.3. This algorithm is practiced inside control centers of eachagent at each time slot t . In the designed Q-learning algorithm, the states are the available PLS policies and transmissionconﬁgurations, as shown inside the block of STPS in Fig3. In the same ﬁgure, the current state indicates the lastactive PLS policy from the time slot t . The actions betweenstates are considered as the changing or continuing the currentconﬁgurations for each transmission timeslot based on thecalculated reward values with the aid of utility values. It shouldbe noted that this model is valid for the individual selections ofthe agents, and should be updated by combining agent-basedprocedures for joint policy and transmission conﬁgurationselection of agents.As we discussed earlier, the number of available PLSpolicies is N i and the number of available transmission con-ﬁgurations is M i for the i th agent. Therefore, the number ofstates is N i + M i + 1 after adding the current operational stateduring the individual operations. It also results in N i availableactions for the ﬁrst stage of the algorithm and M i availableactions for the second stage for individual decisions. In case ofjoint decisions by the agents, the number of available actionson the ﬁrst stage is N × N × · · · × N I , while the number ofavailable actions for the second stage is M × M × · · · × M I .With this procedure, an agent decide a STPS based onthe calculated utility values by (21) for each possible policyand transmission conﬁgurations. It also indicates the initialstate of the operation at the beginning. In this procedure,there is a reward for each transition, which is calculated withthe help of utility values. At the policy selection stage, therewards are speciﬁc values of the calculated utilities with adetermined transmission conﬁguration l i at time slot t , i.e., U ,l i i [ t ] , U ,l i i [ t ] , . . . U N i ,l i i [ t ] . The rewards through the secondselection stage are calculated using the utility differencesof the possible transmission conﬁgurations, and the chosentransmission conﬁguration at the policy stage, l i . The re-wards from the state of the ﬁrst PLS policy to the possibletransmission conﬁgurations of this policy can be written as U , i [ t ] − U ,l i i [ t ] , U , i [ t ] − U ,l i i [ t ] , . . . U ,M i i [ t ] − U ,l i i [ t ] .These values can be positive or negative depending on thevalues of the utility values and the calculation of its sub-dimensions, which is given in the next section.If we update the MDP for a joint selection of PLS policies, TABLE I: RL-based simulation parameters and the decisionmethodologies for actualizing a STPS

Decision methodology Time slot of the agents

Individual t − Joint Simultaneously

Simulation Parameters

Number of agents, I . (for individual decisions) . (for joint decisions)Learning rate . (for individual decisions) . (for joint decisions) (cid:15) . (for individual decisions) . (for joint decisions)Number of Training Episodes (for individual decisions) (for joint decisions)Number of time slots, T δ k i → k (cid:48) i i . for k i (cid:54) = k (cid:48) i for k i = k (cid:48) i the available number of policies and transmission conﬁgura-tions drastically increase. Cooperating agents do not have tooperate with the same STPS. In some cases, they should notrun with the same STPS, since some of the policies are verydisadvantageous and costly if multiple agents operate withthem. As a result, we have to extend the MDP procedurescheme for the joint operation. In this case, the availablepolicies at the policy stage will be N × N × · · · × N I . Inthis setup, each state represents PLS policies of each agent,e.g., a stage indicates two possible PLS policies if we havea joint decision model for two agents. Similarly, the numberof available conﬁgurations at the transmission conﬁgurationselection stage is M × M × · · · × M I , in the case, where I agents jointly make their STPS. In the following section, weutilize RL-based simulations to implement MDP architecturewith network utility.IV. R EINFORCEMENT L EARNING S IMULATIONS

In order to analyze the performance of the CCPS securityframework for multi-agent systems, an RL-based operationis studied and simulated in MATLAB environment. RL canbe easily applied for cyclic system models, including CCPS,thanks to its similar design.In the given simulations, we study I = 2 agents, whichare able to operate individually or jointly. In the scope of thispaper, the considered PLS policies can be listed as P = P = { SC − AN, F D − AI, AN, B } for N = N = 4 . The deter-mined tuples of transmission conﬁgurations for each agent canbe written as R = R = { (5 , , (5 , , (10 , , (10 , } dB for M = M = 4 , where ( · , · ) indicates the transmissionpower and artiﬁcial noise levels of an agent, respectively. Arectangular surface of m × m is assumed as the CCPS en-vironment. The locations of the transmitter and receiver nodesfor the ﬁrst agent are respectively selected as ( − , , (1 , ,while the transmitter and receiver nodes of the second agentare placed at ( − , − , (1 , − , respectively. The eavesdrop-per is located at the center, as ( x e , y e ) = (0 , . The pdf ofthe eavesdropper’s location is 2D Gaussian distribution withvariance . The noise variances at the eavesdropper and agentsare assumed to be equal to .

10 20 30 40 50SC-ANFD-AIBAN C Policies

10 20 30 40 50(5,5)(5,10)(10,5)(10,10) (P S , P N ) [dB]

10 20 30 40 50SC-ANFD-AIBAN

Policies

10 20 30 40 50(5,5)(5,10)(10,5)(10,10) (P S , P N ) [dB]

10 20 30 40 50SC-ANFD-AIBAN C

10 20 30 40 50(5,5)(5,10)(10,5)(10,10) 10 20 30 40 50SC-ANFD-AIBAN 10 20 30 40 50(5,5)(5,10)(10,5)(10,10)10 20 30 40 50SC-ANFD-AIBAN C

10 20 30 40 50(5,5)(5,10)(10,5)(10,10) 10 20 30 40 50SC-ANFD-AIBAN 10 20 30 40 50(5,5)(5,10)(10,5)(10,10)10 20 30 40 50

TimeSlots

SC-ANFD-AIBAN C Individiual DecisionJoint Decision

10 20 30 40 50

TimeSlots (5,5)(5,10)(10,5)(10,10)

Individiual DecisionJoint Decision

10 20 30 40 50

TimeSlots

SC-ANFD-AIBAN

Individiual DecisionJoint Decision

10 20 30 40 50

TimeSlots (5,5)(5,10)(10,5)(10,10)

Individiual DecisionJoint Decision

AGENT 1 AGENT 2 (a) STPS of the agents.

10 20 30 40 500.40.6 C Agent 1 Utility

10 20 30 40 500.40.6

Agent 2 Utility

10 20 30 40 500.40.50.6

Average Weighted Sum

10 20 30 40 500.40.60.8 C

10 20 30 40 500.40.6 10 20 30 40 500.40.610 20 30 40 500.40.50.6 C

10 20 30 40 500.40.6 10 20 30 40 500.40.50.610 20 30 40 50

TimeSlots C Individiual DecisionJoint Decision

10 20 30 40 50

TimeSlots

Individiual DecisionJoint Decision

10 20 30 40 50

TimeSlots

Individual DecisionJoint Decision (b) Agent-based utility values and average weighted sum values.

Fig. 5: Simulation results observed with individual and jointdecision methodologies for chosen CCPS applications.In Table I, we present the RL-based training and simulationparameters, where the future values are taken into accountby choosing a discount rate as one while maximizing utilityduring training RL environment. T , which is the number oftime slots that affect the behavior of the agents due to changingweights of the utility dimensions, is chosen to be . Thereare two possible (cid:15) values for the designed methodologies ofSTPS. These values are chosen for maximizing utilities duringthe operations.In this paper, we consider four main applications of CCPS,whose weights are given in Table II, and also presented in[2]. As detailed in the previous section, the utility calculationconsists of several parameters, including the vector of weightsfor each dimension w i [ t ] , the transition vector d k i i , impacts ofan increasing number of time slots on each dimension denotedas γ s [ t ] , γ q [ t ] , and γ c [ t ] , respectively. These weights are arbi-trarily chosen and can be altered based on usage differences.As the second parameter, each transition from one PLS policyto another creates a utility loss with the . ratio; therefore, δ k i → k (cid:48) i = δ k i → k (cid:48) i = 0 . when k i (cid:54) = k (cid:48) i as given in Table I.This value can also be updated based on various operationalenvironments, which may lead to additional penalties, e.g.,delays and distortion for changing PLS policies.In the simulations, the considered decision methodologiesare based on the individual and joint operations of the agentswith different Q-learning tables and MDP-based schemes, asshown in Fig. 4. When the agents individually determine theiroperational PLS policy and transmission conﬁguration, they TABLE II: Utility weights for chosen CCPS applications formaking a STPS.

Agent Utility Weights Applications w i,s [1] w i,q [1] w i,c [1] C C C C TABLE III: Q-table that is logged for the weights of thescenario C for the second agent when t = 50 . The actionsare indicated based on the transitions on the Fig. 4(a), wherethe condition of N i = M i = 4 results in available actions. Actions

Left Mid-Left Mid-Right Right

States (To)

States (From) SC-AN FD-AI B ANLast State 0.3783 0.4145 0.5373 0.4336

States (To)

States (From) (5,5) dB (5,10) dB (10,5) dB (10,10) dBSC-AN 0 -0.1648 0.0011 -0.2011FD-AI 0 -0.2121 0.0133 -0.1780B 0 0.0410 0.0608 0.0523AN 0 -0.1626 0.0586 -0.0997 make this decision based on other agent’s STPS, as shown inTable I. On the other hand, the agents simultaneously decidetheir conﬁgurations if they choose joint decision methodologyat the same time.According to the ﬁrst decision methodology, each agentmakes its decisions that maximize their utility without con-sidering the performance of other agents. The representationof the individual decision methodology given in Fig. 4a for theboth of the agents, where i = 1 , . In this ﬁgure, the rewardsof each path are indicated in terms of utility values basedon available PLS policies and transmission conﬁgurations.At the policy selection stage, (5 , dB is chosen as thereference transmission conﬁguration, where l i = 1 . Therefore,the rewards of the transmission conﬁguration selection stageare calculated as the difference between the utility of (5 , dB transmission conﬁguration and the utility of an availabletransmission conﬁguration for a selected PLS policy. In thismethodology, the rewards, which are calculated with utilitiesas expressed in (21), are logged in Q-tables, which show theaction based rewards, and utilized in RL-based simulations.An examplary Q-table is given in Table III for the weightsof C at the end of the assigned frame time, t = 50 . Thesevalues signiﬁcantly match with the calculated rewards basedon the MDP-based schemes given in Fig. 4.In the second scenario, we assume that the agents are able tomake joint decisions that maximize the overall system utility.Since the agents do not have to operate at the same STPS,the available number of the PLS policies and the transmissionconﬁgurations are N × N , and M × M , respectively. Inthese circumstances, the MDP-based joint decision methodol-ogy is presented in Fig. 4b with available PLS policies andtransmission conﬁgurations. V. R ESULTS AND D ISCUSSION

A. Simulation Results

In this section, we present the results of simulations con-sidering different CCPS applications determining agent be-haviours and capabilities detailed in the previous section. Theresults are illustrated in Fig. 5, Table IV, and Table V, whereaseach presentation is applied for the chosen weights given inTable II due to the existence of multiple utility weights withrespect to several applications.In Fig. 5a, the behavior of two agents is shown foran increasing number of time slots based on two decisionmethodologies, while the operating STPS are given separately.In these subﬁgures, readers can follow the changes of eachagent, where they make dyna mic decisions for changingenvironments. These results show that the agents tend tochoose beamforming most of the time to satisfy the operationalrequirements with individual decision methodology. On theother hand, there is a cooperation between agents with a jointdecision methodology, since at least one of the agents chooses (5 , dB as transmission conﬁguration in most of the cases.The impacts on the performance of these selections in termsof joint utility values are shown in Fig. 5b for each agentwith a comparison of the average weighted sum of agent-basedutility values. The results of the weighted sum of utility valuesshow that joint decision methodology provides higher utilityvalues than individual decision methodology most of the time.This observation is expected since the agents operate moreadequate PLS policies and transmission conﬁgurations witha joint decision methodology rather than individual decisions.From our point of view, the amount of ﬂuctuation in individualdecision-based results is considerably high compared to jointdecision-based outcomes. This situation is observed due tothe lack of cooperation between agents. Without cooperation,agents individually try to obtain their maximum utility values,but the selection of one agent may not satisfy the utilitymaximization conditions for the other agent. This situation isespecially valid for health networks and mMTC applications,as shown in the same ﬁgure. On the other hand, if there is acooperation, the average weighted sum values are more stable.To observe the secure transmission performance in termsof security, QoS and cost dimensions individually, we focuson Table IV and Table V. In these tables, the average relativeindicator results are provided considering the individual andjoint decision methodologies for each application and PLSpolicies. The average relative indicator is identiﬁed as the av-erage percentage-based relative differences between the STPSfor T and each PLS policies that operate with the transmissionconﬁguration (10 , dB. In other words, these tables showthe relative differences for utility and its dimensions whenthe proposed methodologies are applied instead of operatingwith a single PLS policy at maximum transmit power levels.Since these tables present dimension-based relative indicators,weights of each dimension given in Table I should also beconsidered. In Table IV, readers may observe that the averagerelative indicator values are consistent with the assignedweights for both of the agents due to the independent andselﬁsh decisions of the agents to satisfy their requirements. TABLE IV: Average Relative Indicator based on individual decision based policy and transmission conﬁguration selectionmechanism with (cid:15) = 0 . . AGENT 1

Drone Swarms URLLC mMTC Health NetworksAverage relative indicator Average relative indicator Average relative indicator Average relative indicator

Utility Sec. QoS Cost Utility Sec. QoS Cost Utility Sec. QoS Cost Utility Sec. QoS Cost

SCAN

FD-AI B

0% 1% 0% 99% 3% 9% 1% 122% 7% 22% 12% 69% 13% 35% 42% 46% AN (a) (b) (c) (d) AGENT 2

Average relative indicator Average relative indicator Average relative indicator Average relative indicator

Utility Sec. QoS Cost Utility Sec. QoS Cost Utility Sec. QoS Cost Utility Sec. QoS Cost

SCAN

FD-AI

78% 16% 80% 49% 49% 14% 49% 62% 34% 0% 49% 37% 167% 11% 47% 30% B

4% 1% 5% 99% 19% 3% 21% 124% 8% 15% 21% 74% 9% 25% 22% 59% AN

45% 4% 46% 49% 22% 5% 20% 62% 134% 18% 20% 37% 119% 27% 18% 30% (e) (f) (g) (h)

TABLE V: Average Relative Indicator based on joint decision based policy and transmission conﬁguration selection mechanismwith (cid:15) = 0 . . AGENT 1

Drone Swarms URLLC mMTC Health NetworksAverage relative indicator Average relative indicator Average relative indicator Average relative indicator

Utility Sec. QoS Cost Utility Sec. QoS Cost Utility Sec. QoS Cost Utility Sec. QoS Cost

SCAN

FD-AI B

6% 12% 6% 82% 14% 2% 21% 98% 7% 41% 55% 32% 14% 41% 55% 32% AN (a) (b) (c) (d) AGENT 2

Average relative indicator Average relative indicator Average relative indicator Average relative indicator

Utility Sec. QoS Cost Utility Sec. QoS Cost Utility Sec. QoS Cost Utility Sec. QoS Cost

SCAN

FD-AI B

9% 27% 30% 60% 25% 28% 42% 59% 8% 42% 57% 32% 15% 41% 57% 32% AN

92% 29% 7% 30% 27% 30% 10% 29% 213% 43% 35% 16% 317% 44% 35% 16% (e) (f) (g) (h)

On the other hand, the decisions are delivered to maximize thesystem utility with the joint decision methodology. Therefore,the average relative indicator values of utility in Table Vare signiﬁcantly higher than the same group of outcomes inTable IV. These observations are expected due to the lackof dimension-based constraints while maximizing the utilityvalue. This fact may signiﬁcantly degrade system performance,and its importance will be discussed in the next section.

B. Discussion and Open Issues

The lack of dimension-based constraints is the reason be-hind the inconsistencies between observed high utility valuesdue to the insufﬁcient dimension-based gains. As a result, themaximized utility may not be sufﬁcient to satisfy the speciﬁcrequirements of the agents. An example is that the mostimportant dimension of the agents is the cost with weightsof . and . , respectively in C . The results show thatthe joint decision methodology provides much higher utilitythan individual decision methodology for this agent; however,individual decision methodology satisﬁes the cost requirementfar better than joint decision methodology. Naturally, this agentshould individually operate, while the second agent tries tocooperate with the ﬁrst agent due to a high amount of utilitywithout any dominant weight for each dimension. Here, a discussion arises on when the average weighted sum of utilityvalues and the average relative indicator should be utilizedfor deciding on real system deployments. In general, we canstate that the agents should jointly decide if their weights areclose and there is no dominant weight in Table II, e.g., droneswarms, or there is the dominant QoS weight for both ofthe agents, e.g., URLLC and V2X applications. On the otherhand, the agents should individually operate to maximize theirdimension-based requirements, e.g., the agents in mMTC andhealth network applications. Similar observations can also bemade for other applications of CCPS.One predictable open issue is the implementation of theproposed system in a real-time testbed. Since the provided al-gorithm can be implemented over the software deﬁned radios,considering real-time problems such as hardware impairmentsand channel estimation errors are needed to be considered fora real-time application.Another important open issue is that CCPS may consistof several nodes with various priorities. For example, in ahealth network, the security and availability of an implantsensor would be more important than the data access point.Therefore, there should be a hierarchical order for the agents.We believe that our formulation is sufﬁciently ﬂexible to deﬁnea priority-based deﬁnition, which can be extended by adding a multiplication vector to (21). After this update, we canalso model the relationship between the agents based on theirimportance. For instance, some nodes may sacriﬁce from theirsecurity to enable high QoS of a more important agent.VI. C ONCLUSION

In this work, we have proposed a secure transmissionpolicy selection scheme at physical layer based on the Markovdecision process for multi-agents in a CCPS environment.The reward of the proposed scheme is termed as the utility,which consists of security, QoS, and cost dimensions. Theindividual utilities of the agents are consubstantiated to formthe joint utility, illustrating the overall performance of theCCPS framework. Reinforcement learning-based simulationsare conducted considering two different agent behaviors: theindividual policy and transmission conﬁguration selectionscheme, and the joint policy and transmission conﬁgurationselection scheme. By tuning the weights of the utility dimen-sions, we analyze the performance of the proposed schemesin different beyond-5G applications.The superior utility performance of the proposed policyselection schemes indicates that a single physical layer securitymethod cannot cover all requirements imposed by differentapplications. As future work, prioritization based hierarchicalorders of the agents can be studied. To increase the perfor-mance of the adaptivity of the framework, dimension-basedconstraints should be added to the physical layer securitypolicy and transmission conﬁguration selections.R

EFERENCES[1] R. Khan, P. Kumar, D. N. K. Jayakody, and M. Liyanage, “A surveyon security and privacy of 5G technologies: Potential solutions, recentadvancements, and future directions,”

IEEE Commun. Surveys Tuts. ,vol. 22, no. 1, pp. 196–248, 2020.[2] O. A. Topal, M. O. Demir, Z. Liang, A. E. Pusane, G. Dartmann,G. Ascheid, and G. K. Kurt, “A physical layer security framework forcognitive cyber physical systems,”

IEEE Wireless Commun. , vol. 27,no. 4, pp. 32–39, 2020. [3] N. Wang, P. Wang, A. Alipour-Fanid, L. Jiao, and K. Zeng, “Physical-layer security of 5G wireless networks for IoT: Challenges and oppor-tunities,”

IEEE Internet Things J. , vol. 6, no. 5, pp. 8169–8181, 2019.[4] D. Wang, B. Bai, W. Zhao, and Z. Han, “A survey of optimizationapproaches for wireless physical layer security,”

IEEE Commun. SurveysTuts. , vol. 21, no. 2, pp. 1878–1911, 2019.[5] M. O. Demir, A. E. Pusane, G. Dartmann, G. Ascheid, and G. K.Kurt, “A garden of cyber physical systems: Requirements, challengesand implementation aspects,”

IEEE Internet Things Mag. , Early access.[6] D. W. K. Ng, E. S. Lo, and R. Schober, “Secure resource allocationand scheduling for OFDMA decode-and-forward relay networks,”

IEEETrans. on Wireless Commun. , vol. 10, no. 10, pp. 3528–3540, 2011.[7] Y. P. Hong, P. Lan, and C. J. Kuo, “Enhancing physical-layer secrecyin multiantenna wireless systems: An overview of signal processingapproaches,”

IEEE Signal Process. Mag. , vol. 30, no. 5, pp. 29–40,2013.[8] H. Wang and X. Xia, “Enhancing wireless secrecy via cooperation:signal design and optimization,”

IEEE Commun. Mag. , vol. 53, no. 12,pp. 47–53, 2015.[9] P. Huang, Y. Hao, T. Lv, J. Xing, J. Yang, and P. T. Mathiopoulos,“Secure beamforming design in relay-assisted Internet of Things,”

IEEEInternet Things J. , vol. 6, no. 4, pp. 6453–6464, 2019.[10] N. Yang, P. Yeoh, M. Elkashlan, R. Schober, and I. B. Collings,“Transmit antenna selection for security enhancement in MIMO wiretapchannels,”

IEEE Trans. on Commun. , vol. 61, no. 1, pp. 144–154, 2013.[11] R. Zi, J. Liu, L. Gu, and X. Ge, “Enabling security and high energy efﬁ-ciency in the Internet of Things with massive MIMO hybrid precoding,”

IEEE Internet Things J. , vol. 6, no. 5, pp. 8615–8625, 2019.[12] Z. Xu and Q. Zhu, “A cyber-physical game framework for secure andresilient multi-agent autonomous systems,” in

IEEE Confer. on Decisionand Control , 2015, pp. 5156–5161.[13] T. Liu, S. Han, W. Meng, C. Li, and M. Peng, “Dynamic powerallocation scheme with clustering based on physical layer security,”

IETCommun. , vol. 12, no. 20, pp. 2546–2551, 2018.[14] F. Gabry, A. Zappone, R. Thobaben, E. A. Jorswieck, and M. Skoglund,“Energy efﬁciency analysis of cooperative jamming in cognitive radionetworks with secrecy constraints,”

IEEE Wireless Commun. Lett. , vol. 4,no. 4, pp. 437–440, 2015.[15] X. Xu, Y. Cai, W. Yang, W. Yang, J. Hu, and T. Yin, “Energy-efﬁcientoptimization for physical layer security in large-scale random crns,” in

Inter. Confer. on Wireless Commun. Signal Process. (WCSP) , 2015, pp.1–6.[16] Z. Deng, Q. Li, Q. Zhang, L. Yang, and J. Qin, “Beamforming designfor physical layer security in a two-way cognitive radio IoT networkwith SWIPT,”

IEEE Internet Things J. , vol. 6, no. 6, pp. 10 786–10 798,2019.[17] L. Mucchi, L. Ronga, X. Zhou, K. Huang, Y. Chen, and R. Wang, “Anew metric for measuring the security of an environment: The secrecypressure,”