Data-Injection Attacks
DData-Injection Attacks
I˜naki Esnaola , , Samir M. Perlaza , , and Ke Sun
1. Dept. of Automatic Control and Systems Eng., University of Sheffield, UK2. INRIA, Centre de Recherche de Sophia Antipolis-M´editerran´ee, France3. Department of Electrical Engineering, Princeton University, USAemail: esnaola@sheffield.ac.uk, [email protected], ke.sun@sheffield.ac.uk
Chapter 9 of: Advanced Data Analytics for Power Systems, A. Tajer, S. M.Perlaza and H. V. Poor, Eds., Cambridge University Press, Cambridge, UK, 2021,pp. 197-229
February 02, 2021 a r X i v : . [ ee ss . S Y ] F e b ontents hapter 9 Data-Injection Attacks
The pervasive deployment of sensing, monitoring, and data acquisition techniques in modernpower systems enables the definition of functionalities and services that leverage accurateand real-time information about the system. This wealth of data supports network operatorsin the design of advanced control and management techniques that will inevitably changethe operation of future power systems. An interesting side-effect of the data collectionexercise that is starting to take place in power systems is that the unprecedented dataanalysis effort is shedding some light on the turbulent dynamics of power systems. Whilethe underlying physical laws governing power systems are well understood, the large scale,distributed structure, and stochastic nature of the generation and consumption processesin the system results in a complex system. The large volumes of data about the state ofthe system are opening the door to modelling aspirations that were not feasible prior to thearrival of the smart grid paradigm.The refinement of the models describing the power system operation will undoubtedlyprovide valuable insight to the network operator. However, that knowledge and the explana-tory principles that it uncovers are also subject to be used in a malicious fashion. Access tostatistics describing the state of the grid can inform malicious attackers by allowing themto pose the data-injection problem [1] problem within a probabilistic framework [2, 3]. Bydescribing the processes taking place in the grid as a stochastic process, the network opera-tor can incorporate the statistical description of the state variables in the state estimationprocedure and pose it within a Bayesian estimation setting. Similarly, the attacker canexploit the stochastic description of the state variables by incorporating it to the attackconstruction in the form of prior knowledge about the state variables. Interestingly wetherthe network operator or the attacker benefit more from adding a stochastic description tothe state variables does not have a simple answer and depends greatly on the parametersdescribing the power system.In this chapter we review some of the basic attack constructions that exploit a stochasticdescription of the state variables. We pose the state estimation problem in a Bayesian setting1nd cast the bad data detection procedure as a Bayesian hypothesis testing problem. Thisrevised detection framework provides the benchmark for the attack detection problem thatlimits the achievable attack disruption. Indeed, the trade-off between the impact of theattack, in terms of disruption to the state estimator, and the probability of attack detectionis analytically characterized within this Bayesian attack setting. We then generalize theattack construction by considering information-theoretic measures that place fundamentallimits to a broad class of detection, estimation, and learning techniques. Because the attackconstructions proposed in this chapter rely on the attacker having access to the statisticalstructure of the random process describing the state variables, we conclude by studying theimpact of imperfect statistics on the attack performance. Specifically, we study the attackperformance as a function of the size of the training data set that is available to the attackerto estimate the second-order statistics of the state variables.
We model the state of the system as the vector of n random variables X n taking values in R n with distribution P X n . The random variable X i with i = 1 , , . . . , n , denotes the statevariable i of the power system, and therefore, each entry represents a different physicalmagnitude of the system that the network operator wishes to monitor. The prior knowledgethat is available to the network operator is described by the probability distribution P X n .The knowledge of the distribution is a consequence of the modelling based on historicaldata acquired by the network operator. Assuming linearized system dynamics with m measurements corrupted by additive white Gaussian noise (AWGN), the measurements aremodelled as the vector of random variables Y m ∈ R m with distribution P Y m given by Y n = H X n + Z m , (9.1)where H ∈ R m × n is the Jacobian of the linearized system dynamics around a given operatingpoint and Z m ∼ N (0 , σ I ) is thermal white noise with power spectral density σ . While theoperation point of the system induces a dynamic on the Jacobian matrix H , in the followingwe assume that the time-scale over which the operation point changes is small comparedto the time-scale at which the state estimator operates to produce the estimates. For thatreason, in the following we assume that the Jacobian matrix is fixed and the only sources ofuncertainty in the observation process originate from the stochasticity of the state variablesand the additive noise corrupting the measurements.The aim of the state estimator is to obtain an estimate ˆ X n of the state vector X n from the system observations Y m . In this chapter we adopt a linear estimation frameworkresulting in an estimate given by ˆ X n = L Y m , where L ∈ R n × m is the linear estimationmatrix determining the estimation procedure. In the case in which the operator knows thedistribution P X n of the underlying random process governing the state of the network, theestimation is performed by selecting the estimate that minimizes a given error cost function.2 common approach is to use the mean square error (MSE) as the error cost function. Inthis case, the network operator uses an estimator M that is the unique solution to thefollowing optimization problem: M = arg min L ∈ R n × m E ï n (cid:107) X n − L Y m (cid:107) ò , (9.2)where the expectation is taken with respect to X n and Z m .Under the assumption that the network state vector X n follows an n -dimensional realGaussian distribution with zero mean and covariance matrix Σ XX ∈ S m + , i.e. X n ∼ N ( , Σ XX ), the minimum MSE (MMSE) estimate is given byˆ X n ∆ = E [ X n | Y m ] = M Y m (9.3)where, M = Σ XX H T ( HΣ XX H T + σ I ) − . (9.4) The aim of the attacker is to corrupt the estimate by altering the measurements. Data-injection attacks alter the measurements available to the operator by adding an attackvector to the measurements. The resulting observation model with the additive attackvector is given by Y ma = H X m + Z m + a , (9.5)where a m ∈ R m is the attack vector and Y ma ∈ R m is the vector containing the compromisedmeasurements [1]. Note that in this formulation, the attack vector does not have a proba-bilistic structure, i.e. the attack vector is deterministic. The random attack construction isconsidered later in the chapter.The intention of the attacker can respond to diverse motivations, and therefore, attackconstruction strategy changes depending on the aim of the attacker. In this chapter, westudy attacks that aim to maximize the monitoring disruption, i.e. attacks that obstructthe state estimation procedure with the aim of deviating the estimate as much as possiblefrom the true state. In that sense, the attack problem is bound to the cost function usedby the state estimator to obtain the estimate, as the attacker aims to maximize it whilethe estimator aims to minimize it. In the MMSE setting described in the preceding text, itfollows that the the impact of the attack vector is obtained by noticing that the estimatewhen the attack vector is present is given byˆ X na = M ( H X n + Z m ) + Ma . (9.6)The term Ma is referred to as the Bayesian injection vector introduced by the attack vector a and is denoted by c ∆ = Ma = Σ XX H T ( HΣ XX H T + σ I ) − a . (9.7)3he Bayesian injection vector is a deterministic vector that corrupts the MMSE estimateof the operator resulting in ˆ X na = ˆ X n + c . (9.8)where ˆ X n is given in (9.3). As a part of the grid management, a network operator systematically attempts to identifymeasurements that are not deemed of sufficient quality for the state estimator. In practice,this operation can be cast as a hypothesis testing problem with hypotheses H : There is no attack , versus H : Measurements are compromised . (9.9)Assuming the operator knows the distribution of the state variables, P X n , and the obser-vation model (9.5), then it can obtain the joint distribution of the measurements and thestate variables for both normal operation conditions and the case when an attack is present,i.e. P X n Y m and P X n Y ma , respectively.Under the assumption that the state variables follow a multivariate Gaussian distri-bution X n ∼ N ( , Σ XX ) it follows that the vector of measurements Y n follows an m -dimensional real Gaussian random distribution with covariance matrix Σ YY = HΣ XX H T + σ I , (9.10)and mean a when there is an attack; or zero mean when there is no attack. Within thissetting, the hypothesis testing problem described before is adapted to the attack detectionproblem by comparing the following hypotheses: H : Y m ∼ N ( , Σ YY ) , versus H : Y m ∼ N ( a , Σ YY ) . (9.11)A worst case scenario approach is assumed for the attackers, namely, the operator knowsthe attack vector, a , used in the attack. However, the operator does not know a prioriwhether the grid is under attack or not, which accounts for the need of an attack detectionstrategy. That being the case, the optimal detection strategy for the operator is to perform alikelihood ratio test (LRT) L ( y , a ) with respect to the observations y . Under the assumptionthat state variables follow a multivariate Gaussian distribution, the likelihood ratio can becalculated as L ( y , a ) = f N ( , Σ yy ) ( y ) f N ( a , Σ yy ) ( y ) = exp Å a T Σ − YY a − a T Σ − YY y ã , (9.12)where f N ( µ , Σ ) is the probability density function of a multivariate Gaussian random vec-tor with mean µ and covariance matrix Σ . Therefore, either hypothesis is accepted byevaluating the inequalities L ( y , a ) H ≷ H τ, (9.13)4here τ ∈ [0 , ∞ ) is tuned to set the trade-off between the probability of detection and theprobability of false alarm. This section describes the construction of data-injection attacks in the case in which thereis a unique attacker with access to all the measurements on the power system. This scenariois referred to as centralized attacks in order to highlight that there exists a unique entitydeciding the data-injection vector a ∈ R m in (9.5). The difference between the scenarioin which there exists a unique attacker or several (competing or cooperating) attackers issubtle and it is treated in Section 9.4.Let M = { , . . . , m } denote the set of all m sensors available to the network operator.A sensor is said to be compromised if the attacker is able to arbitrarily modify its output.Given a total energy budget E > A = ¶ a ∈ R m : a T a (cid:54) E © . (9.14) The attacker chooses a vector a ∈ A taking into account the trade-off between the probabil-ity of being detected and the distortion induced by the Bayesian injection vector given by(9.7). However, the choice of a particular data-injection vector is not trivial as the attackerdoes not have any information about the exact realizations of the vector of state variables x and the noise vector z . A reasonable assumption on the knowledge of the attacker is toconsider that it knows the structure of the power system and thus, it knows the matrix H . It is also reasonable to assume that it knows the first and second moments of the statevariables X n and noise Z m as this can be computed from historical data.Under these knowledge assumptions, the probability that the network operator is unableto detect the attack vector a is P ND ( a ) = E (cid:2) { L ( y , a ) >τ } (cid:3) , (9.15)where the expectation is taken over the joint probability distribution of state variables X n and the AWGN noise vector Z n , and {·} denotes the indicator function. Note that underthese assumptions, Y m is a random variable with Gaussian distribution with mean a andcovariance matrix Σ YY . Thus, the probability P ND ( a ) of a vector a being a successfulattack, i.e., a non-detected attack is given by [4] P ND ( a ) =
12 erfc Ñ a T Σ − a + log τ » a T Σ − a é . (9.16)Often, the knowledge of the threshold τ in (9.13) is not available to the attacker andthus, it cannot determine the exact probability of not being detected for a given attack5ector a . However, the knowledge of whether τ > τ (cid:54) Proposition 9.1 (Case τ (cid:54) . Let τ (cid:54) . Then, for all a ∈ A , P ND ( a ) < P ND (( , . . . , )) and the probability P ND ( a ) is monotonically decreasing with a T Σ − YY a . Proposition 9.2 (Case τ > . Let τ > and let also Σ YY = U YY Λ YY U T YY be the singularvalue decomposition of Σ YY , with U T YY = ( u YY, , . . . , u YY,m ) and Λ YY = diag ( λ YY, , . . . , λ YY,m ) and λ YY, (cid:62) λ YY, (cid:62) . . . , (cid:62) λ YY,m . Then, any vector of the form a = ± » λ YY,k τ u YY,k , (9.17) with k ∈ { , . . . , m } , is a data-injection attack that satisfies for all a (cid:48) ∈ R m , P ND ( a (cid:48) ) (cid:54) P ND ( a ) . The proof of Proposition 9.1 and Proposition 9.2 follows.
Proof.
Let x = a T Σ − YY a and note that x > Σ YY . Letalso the function g : R → R be g ( x ) = x + log τ √ x . (9.18)The first derivative of g ( x ) is g (cid:48) ( x ) = 12 √ x Å − log τx ã . (9.19)Note that in the case in which log τ (cid:54) τ (cid:54) x ∈ R + , g (cid:48) ( x ) > g is monotonically increasing with x . Since the complementary error function erfc ismonotonically decreasing with its argument, the statement of Proposition 9.1 follows andcompletes its proof. In the case in which log τ (cid:62) τ > g (cid:48) ( x ) = 0 is x = 2 log τ and it corresponds to a minimum of the function g . The maximum of erfc( g ( x ))occurs at the minimum of g ( x ) given that erfc is monotonically decreasing with its argument.Hence, the maximum of P ND ( a ) occurs for the attack vectors satisfying: a T Σ − YY a = 2 log τ. (9.20)Solving for a in (9.20) yields (9.17) and this completes the proof of Proposition 9.2.The relevance of Proposition 9.1 is that it states that when τ (cid:54)
1, any non-zero data-injection attack vector possesses a non zero probability of being detected. Indeed, the high-est probability P ND ( a ) of not being detected is guaranteed by the null vector a = (0, . . . , 0),i.e., there is no attack. Alternatively, when τ > a is a data-injectionvector that maximizes the probability P ND ( a ) of not being detected at the same time that itinduces a distortion (cid:107) c (cid:107) (cid:62) D into the estimate. In the case in which τ (cid:54)
1, it follows fromProposition 9.1 and (9.7) that this problem can be formulated as the following optimizationproblem: min a ∈A a T Σ − YY a s.t. a T Σ − YY HΣ XX H T Σ − YY a ≥ D . (9.21)The solution to the optimization problem in (9.21) is given by the following theorem. Theorem 9.1.
Let G = Σ − YY HΣ XX H T Σ − YY have a singular value decomposition G = U G Σ G U T G , with U = ( u G ,i , . . . , u G ,m ) a unitary matrix and Σ G = diag ( λ G , , . . . , λ G ,m ) adiagonal matrix with λ G , (cid:62) . . . (cid:62) λ G ,m . Then, if τ (cid:54) , the attack vector a that maximizesthe probability of not being detected P ND ( a ) while inducing an excess distortion not less than D is a = ± D λ G , Σ YY u G , . (9.22) Moreover, P ND ( a ) = erfc Ñ D02 λ G , +log τ … λ G , é .Proof. Consider the Lagrangian L ( a ) = a T Σ − YY a − γ Ä a T Σ − YY HΣ XX H T Σ − YY a − D ä , (9.23)with γ > a to be a solutionto the optimization problem (9.21) are: ∇ a L ( a ) = 2 Ä Σ − YY − γ Σ − YY HΣ XX H T Σ − YY ä a = 0 (9.24)dd γ L ( a ) = a T Σ − YY HΣ XX H T Σ − YY a − D = 0 . (9.25)Note that any a i = ± D λ G ,i Σ YY u G ,i and (9.26) γ i = λ G ,i , with 1 (cid:54) i (cid:54) rank ( G ) , (9.27)7atisfy γ i > ® a i = ± D λ G ,i Σ YY u G ,i : 1 (cid:54) i (cid:54) rank ( G ) ´ . (9.28)More importantly, any vector a (cid:54) = a i , with 1 (cid:54) i (cid:54) rank ( G ), does not satisfy the necessaryconditions. Moreover, a T i Σ − YY a i = D λ G ,i (cid:62) D λ G , . (9.29)Therefore, a = ± (cid:113) D λ G , Σ YY u G , are the unique solutions to (9.21). This completes theproof.Interestingly, the construction of the data-injection attack a in (9.22) does not requirethe exact knowledge of τ . That is, only knowing that τ (cid:54) D .In the case in which τ >
1, it is also possible to find the data-injection attack vector thatinduces a distortion not less than D and the maximum probability of not being detected.Such a vector is the solution to the following optimization problem.min a ∈A a T Σ − YY a + log τ » a T Σ − YY a s.t. a T Σ − YY HΣ XX H T Σ − YY a ≥ D . (9.30)The solution to the optimization problem in (9.30) is given by the following theorem. Theorem 9.2.
Let G = Σ − YY HΣ XX H T Σ − YY have a singular value decomposition G = U G Σ G U T G , with U G = ( u G ,i , . . . , u G ,m ) a unitary matrix and Σ G = diag ( λ G , , . . . , λ G ,m ) a diagonal matrix with λ G , (cid:62) . . . (cid:62) λ G ,m . Then, when τ > , the attack vector a thatmaximizes the probability of not being detected P ND ( a ) while producing an excess distortionnot less than D is a = ± (cid:113) D λ G ,k ∗ Σ YY u G ,k ∗ if D τλ G , rank G (cid:62) , ±√ τ Σ YY u G , if D τλ G , rank G < with k ∗ = arg min k ∈{ ,..., rank G } : D0 λ G ,k > τ ) D λ G ,k . (9.31) Proof.
The structure of the proof of Theorem 9.2 is similar to the proof of Theorem 9.1 andis omitted in this chapter. A complete proof can be found in [5].8 .3.2 Attacks with Maximum Distortion
In the previous subsection, the attacker constructs its data-injection vector a aiming tomaximize the probability of non-detection P ND ( a ) while guaranteeing a minimum distor-tion. However, this problem has a dual in which the objective is to maximize the distortion a T Σ − YY HΣ XX H T Σ − YY a while guaranteeing that the probability of not being detected re-mains always larger than a given threshold L (cid:48) ∈ [0 , ]. This problem can be formulated asthe following optimization problem:max a ∈A a T Σ − YY HΣ XX H T Σ − YY a s.t. a T Σ − YY a + log τ » a T Σ − YY a ≤ L , (9.32)with L = erfc − (2 L (cid:48) ) ∈ [0 , ∞ ).The solution to the optimization problem in (9.32) is given by the following theorem. Theorem 9.3.
Let the matrix G = Σ − YY HΣ XX H T Σ − YY have a singular value decomposition U G Σ G U T G , with U = ( u G ,i , . . . , u G ,m ) a unitary matrix and Σ G = diag ( λ G , , . . . , λ G ,m ) a diagonal matrix with λ G , (cid:62) . . . (cid:62) λ G ,m . Then, the attack vector a that maximizes theexcess distortion a T Σ − YY GΣ − YY a with a probability of not being detected that does not gobelow L ∈ [0 , ] is a = ± (cid:16) √ L + » L − τ (cid:17) Σ YY u G , , (9.33) when a solution exists.Proof. The structure of the proof of Theorem 9.3 is similar to the proof of Theorem 9.1 andis omitted in this chapter. A complete proof can be found in [5].
Let K = { , . . . , K } be the set of attackers that can potentially perform a data injectionattack on the network, e.g., a decentralized vector attack. Let also C k ∈ { , , . . . , m } bethe set of sensors that attacker k ∈ K can control. Assume that C , . . . , C K are propersets and form a partition of the set M of all sensors. The set A k of data attack vectors a k = ( a k, , a k, , . . . , a k,m ) that can be injected into the network by attacker k ∈ K is of theform A k = { a k ∈ R m : a k,j = 0 for all j / ∈ C k , a T k a k ≤ E k } . (9.34)The constant E k < ∞ represents the energy budget of attacker k . Let the set of all possiblesums of the elements of A i and A j be denoted by A i ⊕ A j . That is, for all a ∈ A i ⊕ A j ,there exists a pair of vectors ( a i , a j ) ∈ A i × A j such that a = a i + a j . Using this notation,let the set of all possible data-injection attacks be denoted by A = A ⊕ A ⊕ . . . ⊕ A K , (9.35)9nd the set of complementary data-injection attacks with respect to attacker k be denotedby A − k = A ⊕ . . . ⊕ A k − ⊕ A k +1 ⊕ . . . ⊕ A K . (9.36)Given the individual data injection vectors a i ∈ A i , with i ∈ { , . . . , K } , the global attackvector a is a = K (cid:88) i =1 a k ∈ A . (9.37)The aim of attacker k is to corrupt the measurements obtained by the set of meters C k by injecting an error vector a k ∈ A k that maximizes the damage to the network, e.g., theexcess distortion, while avoiding the detection of the global data-injection vector a . Clearly,all attackers have the same interest but they control different sets of measurements, i.e., C i (cid:54) = C k , for a any pair ( i, k ) ∈ K . For modeling this behavior, attackers use the utilityfunction φ : R m → R , to determine whether a data-injection vector a k ∈ A k is morebeneficial than another a (cid:48) k ∈ A k given the complementary attack vector a − k = (cid:88) i ∈{ ,...,K }\{ k } a i ∈ A − k (9.38)adopted by all the other attackers. The function φ is chosen considering the fact thatan attack is said to be successful if it induces a non-zero distortion and it is not detected.Alternatively, if the attack is detected no damage is induced into the network as the operatordiscards the measurements and no estimation is performed. Hence, given a global attack a , the distortion induced into the measurements is { L ( Y ma , a ) >τ } x T a x a . However, attackersare not able to know the exact state of the network x and the realization of the noise z before launching the attack. Thus, it appears natural to exploit the knowledge of the firstand second moments of both the state variables x and noise z and consider as a metric theexpected distortion φ ( a ) that can be induced by the attack vector a : φ ( a ) = E î (cid:0) { L ( Y ma , a ) >τ } (cid:1) c T c ó , (9.39)= P ND ( a ) a T Σ − HΣ H T Σ − a , (9.40)where c is in (9.7) and the expectation is taken over the distribution of state variables X n and the noise Z m . Note that under this assumptions of global knowledge, this modelconsiders the worst case scenario for the network operator. Indeed, the result presented inthis section corresponds to the case in which the attackers inflict the most harm onto thestate estimator. The benefit φ ( a ) obtained by attacker k does not only depend on its own data-injectionvector a k , but also on the data-injection vectors a − k of all the other attackers. This be-comes clear from the construction of the global data-injection vector a in (9.37), the excess10istortion x a in (9.7) and the probability of not being detected P ND ( a ) in (9.16). Therefore,the interaction of all attackers in the network can be described by a game in normal form G = (cid:0) K , {A k } k ∈K , φ (cid:1) . (9.41)Each attacker is a player in the game G and it is identified by an index from the set K .The actions player k might adopt are data-injection vectors a k in the set A k in (9.34). Theunderlying assumption in the following of this section is that, given a vector of data-injectionattacks a − k , player k aims to adopt a data-injection vector a k such that the expected excessdistortion φ ( a k + a − k ) is maximized. That is, a k ∈ BR k ( a − k ) , (9.42)where the correspondence BR k : A − k → A k is the best response correspondence, i.e.,BR k ( a − k ) = arg max a k ∈A k φ ( a k + a − k ) . (9.43)The notation 2 A k represents the set of all possible subsets of A k . Note that BR k ( a − k ) ⊆ A k is the set of data-injection attack vectors that are optimal given that the other attackershave adopted the data-injection vector a − k . In this setting, each attacker tampers with asubset C k of all sensors C = { , , . . . , m } , as opposed to the centralized case in which thereexists a single attacker that is able to tampers with all sensors in C .A game solution that is particularly relevant for this analysis is the NE [6]. Definition 9.1 (Nash Equilibrium) . The data-injection vector a is an NE of the game G if and only if it is a solution of the fix point equation a = BR ( a ) , (9.44) with BR :
A → A being the global best-response correspondence, i.e., BR ( a ) = BR ( a − ) + . . . + BR K ( a − K ) . (9.45)Essentially, at an NE, attackers obtain the maximum benefit given the data-injectionvector adopted by all the other attackers. This implies that an NE is an operating pointat which attackers achieve the highest expected distortion induced over the measurements.More importantly, any unilateral deviation from an equilibrium data-injection vector a doesnot lead to an improvement of the average excess distortion. Note that this formulationdoes not say anything about the exact distortion induced by an attack but the averagedistortion. This is mainly because the attack is chosen under the uncertainty of the statevector X n and the noise term Z m .The following proposition highlights an important property of the game G in (9.41). Proposition 9.3.
The game G in (9.41) is a potential game. roof. The proof follows immediately from the observation that all the players have thesame utility function φ [7]. Thus, the function φ is a potential of the game G in (9.41) andany maximum of the potential function is an NE of the game G .In general, potential games [7] possess numerous properties that are inherited by thegame G in (9.41). These properties are detailed by the following propositions Proposition 9.4.
The game G possesses at least one NE.Proof. Note that φ is continuous in A and A is a convex and closed set; therefore, therealways exists a maximum of the potential function φ in A . Finally from Lemma 4 . The attackers are said to play a sequential best response dynamic (BRD) if the attackerscan sequentially decide their own data-injection vector a k from their sets of best responsesfollowing a round-robin (increasing) order. Denote by a ( t ) k ∈ A the choice of attacker k during round t ∈ N and assume that attackers are able to observe all the other attackers’data-injection vectors. Under these assumptions, the BRD can be defined as follows. Definition 9.2 (Best Response Dynamics) . The players of the game G are said to play bestresponse dynamics if there exists a round-robin order of the elements of K in which at eachround t ∈ N , the following holds: a ( t ) k ∈ BR k Ä a ( t )1 + . . . + a ( t ) k − + a ( t − k +1 + . . . + a ( t − K ä . (9.46)From the properties of potential games (Lemma 4 . Lemma 9.1 (Achievability of NE attacks) . Any BRD in the game G converges to a data-injection attack vector that is an NE. The relevance of Lemma 9.1 is that it establishes that if attackers can communicate in atleast a round-robin fashion, they are always able to attack the network with a data-injectionvector that maximizes the average excess distortion. Note that there might exists severalNEs (local maxima of φ ) and there is no guarantee that attackers will converge to the bestNE, i.e., a global maximum of φ . It is important to note that under the assumption thatthere exists a unique maximum, which is not the case for the game G (see Theorem 9.4), allattackers are able to calculate such a global maximum and no communications is requiredamong the attackers. Nonetheless, the game G always possesses at least two NEs, whichenforces the use of a sequential BRD to converge to an NE.12 .4.3 Cardinality of the set of NEs Let A NE be the set of all data-injection attacks that form NEs. The following theorembounds the number of NEs in the game. Theorem 9.4.
The cardinality of the set A NE of NE of the game G satisfies (cid:54) |A NE | (cid:54) C · rank( H ) (9.47) where C < ∞ is a constant that depends on τ .Proof. The lower bound follows from the symmetry of the utility function given in (9.39),i.e. φ ( a ) = φ ( − a ), and the existence of at least one NE claimed in Proposition 9.4.To prove the upper bound the number of stationary points of the utility function isevaluated. This is equivalent to the cardinality of the set S = { a ∈ R m : ∇ a φ ( a ) = } , (9.48)which satisfies A NE ⊆ S . Calculating the gradient with respect to the attack vector yields ∇ a φ ( a ) = Ä α ( a ) M T M − β ( a ) Σ − YY ä a , (9.49)where α ( a ) ∆ = erfc Ñ √ a T Σ − YY a + log τ (cid:0) a T Σ − YY a (cid:1) é (9.50)and β ( a ) ∆ = a T M T Ma √ π a T Σ − YY a Ç − log τ a T Σ − YY a å exp Ñ − Ñ √ a T Σ − YY a + log τ (cid:0) a T Σ − YY a (cid:1) é é . (9.51)Define δ ( a ) ∆ = β ( a ) α ( a ) and note that combining (9.4) with (9.49) gives the following condi-tion for the stationary points: Ä HΣ XX H T Σ − YY − δ ( a ) I ä a = . (9.52)Note that the number of linearly independent attack vectors that are a solution of the linearsystem in (9.52) is given by R ∆ = rank Ä HΣ XX H T Σ − YY ä (9.53)= rank ( H ) . (9.54)where (9.54) follows from the fact that Σ XX and Σ YY are positive definite. Define theeigenvalue decomposition Σ − YY HΣ XX H T Σ − YY = UΛU T (9.55)13here Λ is a diagonal matrix containing the ordered eigenvalues { λ i } mi =1 matching theorder of of the eigenvectors in U . As a result of (9.53) there are r eigenvalues, λ k , whichare different from zero and m − r diagonal elements of Λ which are zero. Combining thisdecomposition with some algebraic manipulation, the condition for stationary points in(9.52) can be recast as Σ − YY U ( Λ − δ ( a ) I ) U T Σ − YY a = . (9.56)Let w ∈ R be a scaling parameter and observe that the attack vectors that satisfy a = w Σ YY Ue k and δ ( a ) = λ k for k = 1 , . . . , r are solutions of (9.56). Note that the criticalpoints associated to zero eigenvalues are not NE. Indeed, the eigenvectors associated to zeroeigenvalues yield zero utility. Since the utility function is strictly positive, these criticalpoints are minima of the utility function and can be discarded when counting the numberof NE. Therefore, the set in (9.48) can be rewritten based on the condition in (9.56) as S = R (cid:91) k =1 S k , (9.57)where S k = { a ∈ R m : a = w Σ YY Ue k and δ ( a ) = λ k } . (9.58)There are r linearly independent solutions of (9.56) but for each linearly independent solu-tion there can be several scaling parameters, w , which satisfy δ ( a ) = λ k . For that reason, |S k | is determined by the number of scaling parameters that satisfy δ ( a ) = λ k . To that end,define δ (cid:48) : R → R as δ (cid:48) ( w ) ∆ = δ ( w Σ YY Ue k ). It is easy to check that δ (cid:48) ( w ) = λ k has a finitenumber of solutions for k = 1 , . . . , r . Hence, for all k there exists a constant C k such that |S k | ≤ C k which yields the upper bound |S| ≤ R (cid:88) i =1 |S k | ≤ R (cid:88) i =1 C k ≤ max k C k R. (9.59)Noticing that the there is a finite number of solutions of δ (cid:48) ( w ) = λ k and that they dependonly on τ yields the upper bound. Modern sensing infrastructure is moving toward increasing the number of measurementsthat the operator acquires, e.g. phasor measurement units exhibit temporal resolutions inthe order of miliseconds while supervisory control and data acquisition (SCADA) systemstraditionally operate with a temporal resolution in the order of seconds. As a result, attackconstructions that do not change within the same temporal scale at which measurementsare reported do not exploit all the degrees of freedom that are available to the attacker.14ndeed, an attacker can choose to change the attack vector with every measurement vectorthat is reported to the network operator. However, the deterministic attack constructionchanges when the Jacobian measurement matrix changes, i.e. with the operation point ofthe system. Thus, in the deterministic attack case, the attack construction changes at thesame rate that the Jacobian measurement matrix changes and, therefore, the dynamics ofthe state variables define the update cadency of the attack vector.In this section, we study the case in which the attacker constructs the attack vectoras a random process that corrupts the measurements. By endowing the attack vectorwith a probabilistic structure we provide the attacker with an attack construction strategythat generates attack vector realizations over time and that achieve a determined objectiveon average. In view of this, the task of the attacker in this case is to devise the optimaldistribution for the attack vectors. In the following, we pose the attack construction problemwithin an information-theoretic framework and characterize the attacks that simultaneouslyminimize the mutual information and the probability of detection.
We consider an additive attack model as in (9.5) but with the distinction that the attackis a random process. The resulting vector of compromised measurements is given by Y mA = H X m + Z m + A m , (9.60)where A m ∈ R m is the vector of random variables introduced by the attacker and Y mA ∈ R m is the vector containing the compromised measurements. The attack vector of randomvariables is described by the distribution P A m which is the determined by the attacker.We assume that the attacker has no access to the realizations of the state variables, andtherefore, it holds that P A m X n = P A m P X n where P A m X n denotes the joint distribution of A m and X n .Similarly to the deterministic attack case, we adopt a multivariate Gaussian frameworkfor the state variables such that X n ∼ N ( , Σ XX ). Moreover, we limit the attack vector dis-tribution to the set of zero-mean multivariate Gaussian distributions, i.e. A m ∼ N ( , Σ AA )where Σ AA ∈ S m + is the covariance matrix of the attack distribution. The rationale forchoosing a Gaussian distribution for the attack vector follows from the fact that for themeasurement model in (9.60) the additive attack distribution that minimizes the mutualinformation between the vector of state variables and the compromised measurements isGaussian [8]. As we will see later, minimizing this mutual information is central to theproposed information-theoretic attack construction and indeed one of the objectives of theattacker. Because of the Gaussianity of the attack distribution, the vector of compromisedmeasurements is distributed as Y mA ∼ N ( , Σ Y A Y A ) , (9.61)where Σ Y A Y A = HΣ XX H T + σ I + Σ AA is the covariance matrix of the distribution of thecompromised measurements. Note that while in the case of deterministic attacks the effect15f the attack vector was captured by shifting the mean of the measurement vector, in therandom attack case the attack changes the structure of the second order moments of themeasurements. Interestingly, the Gaussian attack construction implies that knowledge ofthe second order moments of the state variables and the variance of the AWGN introducedby the measurement process suffices to construct the attack. This assumption significantlyreduces the difficulty of the attack construction.The operator of the power system makes use of the acquired measurements to detectthe attack. The detection problem is cast as a hypothesis testing problem with hypotheses H : Y m ∼ N ( , Σ YY ) , versus H : Y m ∼ N ( , Σ Y A Y A ) . (9.62)The null hypothesis H describes the case in which the power system is not compromised,while the alternative hypothesis H describes the case in which the power system is underattack.Two types of error are considered in hypothesis testing problems, Type I error is theprobability of a “true negative” event; and Type II error is the probability of a “false alarm”event. The Neyman-Pearson lemma [9] states that for a fixed probability of Type I error,the likelihood ratio test (LRT) achieves the minimum Type II error when compared withany other test with an equal or smaller Type I error. Consequently, the LRT is chosen todecide between H and H based on the available measurements. The LRT between H and H takes following form: L ( y ) ∆ = f Y mA ( y ) f Y m ( y ) H ≷ H τ, (9.63)where y ∈ R m is a realization of the vector of random variables modelling the measurements, f Y mA and f Y m denote the probability density functions (p.d.f.’s) of Y mA and Y m , respectively,and τ is the decision threshold set by the operator to meet the false alarm constraint. The aim of the attacker is twofold. Firstly, it aims to disrupt the state estimation processby corrupting the measurements in such a way that the network operator acquires the leastamount of knowledge about the state of the system. Secondly, the attacker aspires to remainstealthy and corrupt the measurements without being detected by the network operator.In the following we propose to information-theoretic measures that provide quantitativemetrics for the objectives of the attacker.The data-integrity of the measurements is measured in terms of the mutual informationbetween the state variables and the measurements. The mutual information between tworandom variables is a measure of the amount of information that each random variablecontains about the other random variable. By adding the attack vector to the measurementsthe attacker aims to reduce the mutual information which ultimately results in a loss ofinformation about the state by the network operator. Specifically, the attacker aims tominimize I ( X n ; Y mA ). In view of this, it seems reasonable to consider a Gaussian distribution16or the attack vector as the minimum mutual information for the observation model in (9.5)is achieved by additive Gaussian noise.The probability of attack detection is determined by the detection threshold τ set bythe operator for the LRT and the distribution induced by the attack on the vector of com-promised measurements. An analytical expression of the probability of attack detectioncan be described in closed-form as a function of the distributions describing the measure-ments under both hypotheses. However, the expression is involved in general and it is notstraightforward to incorporate it into an analytical formulation of the attack construction.For that reason, we instead consider the asymptotic performance of the LRT to evaluatethe detection performance of the operator. The Chernoff-Stein lemma [10] characterizesthe asymptotic exponent of the probability of detection when the number of observationsof measurement vectors grows to infinity. In our setting, the Chernoff-Stein lemma statesthat for any LRT and (cid:15) ∈ (0 , / T →∞ T log β (cid:15)T = − D ( P Y mA || P Y m ) , (9.64)where D ( ·||· ) is the Kullback-Leibler (KL) divergence, β (cid:15)T is the minimum Type II error suchthat the Type I error α satisfies α < (cid:15) , and T is the number of m -dimensional measurementvectors that are available for the LRT detection procedure. As a result, minimizing theasymptotic probability of false alarm given an upper bound on the probability of misdetec-tion is equivalent to minimizing D ( P Y mA || P Y m ), where P Y mA and P Y m denote the probabilitydistributions of Y mA and Y m , respectively.The purpose of the attacker is to disrupt the normal state estimation procedure byminimizing the information that the operator acquires about the state variables, whileguaranteeing that the probability of attack detection is sufficiently small, and therefore,remain stealthy. When the two information-theoretic objectives are considered by the attacker, in [11], astealthy attack construction is proposed by combining two objectives in one cost function,i.e., I ( X n ; Y mA ) + D ( P Y mA || P Y m )= D ( P X n Y mA || P X n P Y m ) , (9.65)where P X n Y mA is the joint distribution of X n and Y mA . The resulting optimization problemto construct the attack is given bymin A m D ( P X n Y mA || P X n P Y m ) . (9.66)Therein, it is shown that (9.66) is a convex optimization problem and the covariance matrixof the optimal Gaussian attack is Σ AA = HΣ XX H T . However, numerical simulations onIEEE test system show that the attack construction proposed in the preceding text yieldslarge values of probability of detection in practical settings.17o control the probability of attack detection of the attack, the preceding constructionis generalized in [12] by introducing a parameter that weights the detection term in the costfunction. The resulting optimization problem is given bymin A m I ( X n ; Y mA ) + λD ( P Y mA || P Y m ) , (9.67)where λ ≥ λ = 1 the proposed cost function boils down to the effectivesecrecy proposed in [13] and the attack construction in (9.67) coincides with that in [11]. For λ >
1, the attacker adopts a conservative approach and prioritizes remaining undetectedover minimizing the amount of information acquired by the operator. By increasing thevalue of λ the attacker decreases the probability of detection at the expense of increasingthe amount of information acquired by the operator using the measurements.The attack construction in (9.67) is formulated in a general setting. The followingpropositions particularize the KL divergence and MI to our multivariate Gaussian setting. Proposition 9.5. [10]
The KL divergence between m -dimensional multivariate Gaussiandistributions N ( , Σ Y A Y A ) and N ( , Σ YY ) is given by D ( P Y mA || P Y m ) = 12 Å log | Σ YY || Σ Y A Y A | − m + tr (cid:0) Σ − YY Σ Y A Y A (cid:1) ã . (9.68) Proposition 9.6. [10]
The mutual information between the vectors of random variables X n ∼ N ( , Σ XX ) and Y mA ∼ N ( , Σ Y A Y A ) is given by I ( X n ; Y mA ) = 12 log | Σ XX || Σ Y A Y A || Σ | , (9.69) where Σ is the covariance matrix of the joint distribution of ( X n , Y mA ) . Substituting (9.68) and (9.69) in (9.67) we can now pose the Gaussian attack construc-tion as the following optimization problem:min Σ AA ∈S m + − ( λ −
1) log | Σ YY + Σ AA | − log | Σ AA + σ I | + λ tr( Σ − YY Σ AA ) . (9.70)We now proceed to solve the optimization problem in the preceding text. First, note thatthe optimization domain S m + is a convex set. The following proposition characterizes theconvexity of the cost function. Proposition 9.7.
Let λ ≥ . Then the cost function in the optimization problem in (9.70)is convex.Proof. Note that the term − log | Σ AA + σ I | is a convex function on Σ AA ∈ S m + [14]. Ad-ditionally, − ( λ −
1) log | Σ YY + Σ AA | is a convex function on Σ AA ∈ S m + when λ ≥
1. Sincethe trace operator is a linear operator and the sum of convex functions is convex, it followsthat the cost function in (9.70) is convex on Σ AA ∈ S m + .18 heorem 9.5. Let λ ≥ . Then the solution to the optimization problem in (9.70) is Σ (cid:63)AA = 1 λ HΣ XX H T . (9.71) Proof.
Denote the cost function in (9.70) by f ( Σ AA ). Taking the derivative of the costfunction with respect to Σ AA yields ∂f ( Σ AA ) ∂ Σ AA = − λ − Σ YY + Σ AA ) − − Σ AA + σ I M ) − +2 λ Σ − YY − λ diag( Σ − Y Y )+ ( λ − (cid:0) ( Σ YY + Σ AA ) − (cid:1) + diag (cid:0) ( Σ AA + σ I ) − (cid:1) . (9.72)Note that the only critical point is Σ (cid:63)AA = λ HΣ XX H T . Theorem 9.5 follows immediatelyfrom combining this result with Proposition 9.7. Corollary 9.1.
The mutual information between the vector of state variables and the vectorof compromised measurements induced by the optimal attack construction is given by I ( X n ; Y mA ) = 12 log (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) HΣ XX H T Å σ I + 1 λ HΣ XX H T ã − + I (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . (9.73)Theorem 9.5 shows that the generalized stealth attacks share the same structure of thestealth attacks in [11] up to a scaling factor determined by λ . The solution in Theorem9.5 holds for the case in which λ ≥
1, and therefore, lacks full generality. However, thecase in which λ < H and the second order statistics ofthe state variables, but the variance of the noise introduced by the sensors is not necessary.To obtain the Jacobian, a malicious attacker needs to know the topology of the grid, theadmittances of the branches, and the operation point of the system. The second orderstatistics of the state variables on the other hand, can be estimated using historical data.In [11] it is shown that the attack construction with a sample covariance matrix of thestate variables obtained with historical data is asymptotically optimal when the size of thetraining data grows to infinity.It is interesting to note that the mutual information in (9.73) increases monotonicallywith λ and that it asymptotically converges to I ( X n ; Y m ), i.e. the case in which there isno attack. While the evaluation of the mutual information as shown in Corollary 9.1 isstraightforward, the computation of the associated probability of detection yields involvedexpressions that do not provide much insight. For that reason, the probability of detectionof optimal attacks is treated in the following section.19 .5.4 Probability of Detection of Generalized Stealth Attacks The asymptotic probability of detection of the generalized stealth attacks is governed bythe KL divergence as described in (9.64). However in the non-asymptotic case, determiningthe probability of detection is difficult, and therefore, choosing a value of λ that providesthe desired probability of detection is a challenging task. In this section we first provide aclosed-form expression of the probability of detection by direct evaluation and show thatthe expression does not provide any practical insight over the choice of λ that achieves thedesired detection performance. That being the case, we then provide an upper bound onthe probability of detection, which, in turn, provides a lower bound on the value of λ thatachieves the desired probability of detection. Direct Evaluation of the Probability of Detection
Detection based on the LRT with threshold τ yields a probability of detection given by P D ∆ = E (cid:104) { L ( Y mA ) ≥ τ } (cid:105) . (9.74)The following proposition particularizes the above expression to the optimal attack con-struction described in Section 9.5.3. Lemma 9.2.
The probability of detection of the LRT in (9.63) for the attack constructionin (9.71) is given by P D ( λ ) = P î ( U p ) T ∆ U p ≥ λ (cid:0) τ + log (cid:12)(cid:12) I + λ − ∆ (cid:12)(cid:12)(cid:1) ó , (9.75) where p = rank ( HΣ XX H T ) , U p ∈ R p is a vector of random variables with distribution N ( , I ) , and ∆ ∈ R p × p is a diagonal matrix with entries given by ( ∆ ) i,i = λ i ( HΣ XX H T ) λ i ( Σ − YY ) ,where λ i ( A ) with i = 1 , . . . , p denotes the i -th eigenvalue of matrix A in descending order.Proof. The probability of detection of the stealth attack is, P D ( λ ) = (cid:90) S d P Y mA (9.76)= 1(2 π ) m | Σ Y A Y A | (cid:90) S exp ß − y T Σ − Y A Y A y ™ d y , (9.77)where S = { y ∈ R m : L ( y ) ≥ τ } . (9.78)Algebraic manipulation yields the following equivalent description of the integration domain: S = ¶ y ∈ R m : y T ∆ y ≥ τ + log | I + Σ AA Σ − YY | © , (9.79)with ∆ = Σ − YY − Σ − Y A Y A . Let Σ YY = U YY Λ YY U T YY where Λ YY ∈ R m × m is a diagonalmatrix containing the eigenvalues of Σ YY in descending order and U YY ∈ R m × m is a20nitary matrix whose columns are the eigenvectors of Σ YY ordered matching the order ofthe eigenvalues. Applying the change of variable y = U YY y in (9.77) results in P D ( λ ) = 1(2 π ) m | Σ Y A Y A | (cid:90) S exp ß − y T Λ − Y A Y A y ™ d y , (9.80)where Λ Y A Y A ∈ R m × m denotes the diagonal matrix containing the eigenvalues of Σ Y A Y A indescending order. Noticing that Σ YY , Σ AA and Σ Y A Y A are also diagonalized by U YY , theintegration domain S is given by S = ¶ y ∈ R m : y T ∆ y ≥ τ + log | I + Λ AA Λ − YY | © , (9.81)where ∆ = Λ − YY − Λ − Y A Y A with Λ AA denoting the diagonal matrix containing the eigenvaluesof Σ AA in descending order. Further applying the change of variable y = Λ − Y A Y A y in (9.80)results in P D ( λ ) = 1 (cid:112) (2 π ) m (cid:90) S exp {− y T y } d y , (9.82)with the transformed integration domain given by S = ¶ y ∈ R m : y T ∆ y ≥ τ + log | I + ∆ | © , (9.83)with ∆ = Λ AA Λ − YY . (9.84)Setting ∆ ∆ = λ ∆ and noticing that rank( ∆ ) = rank( HΣ XX H T ) concludes the proof.Notice that the left-hand term ( U p ) T ∆ U p in (9.75) is a weighted sum of independent χ distributed random variables with one degree of freedom where the weights are determinedby the diagonal entries of ∆ which depend on the second order statistics of the state vari-ables, the Jacobian measurement matrix, and the variance of the noise; i.e. the attacker hasno control over this term. The right-hand side contains in addition λ and τ , and therefore,the probability of attack detection is described as a function of the parameter λ . However,characterizing the distribution of the resulting random variable is not practical since thereis no closed-form expression for the distribution of a positively weighted sum of indepen-dent χ random variables with one degree of freedom [15]. Usually, some moment matchingapproximation approaches such as the Lindsay-Pilla-Basak method [16] are utilized to solvethis problem but the resulting expressions are complex and the relation of the probabilityof detection with λ is difficult to describe analytically following this course of action. Inthe following an upper bound on the probability of attack detection is derived. The upperbound is then used to provide a simple lower bound on the value λ that achieves the desiredprobability of detection. 21 pper Bound on the Probability of Detection The following theorem provides a sufficient condition for λ to achieve a desired probabilityof attack detection. Theorem 9.6.
Let τ > be the decision threshold of the LRT. For any t > and λ ≥ max ( λ (cid:63) ( t ) , then the probability of attack detection satisfies P D ( λ ) ≤ e − t , (9.85) where λ ∗ ( t ) is the only positive solution of λ satisfying λ log τ − λ tr( ∆ ) − » tr( ∆ ) t − || ∆ || ∞ t = 0 . (9.86) and || · || ∞ is the infinity norm.Proof. We start with the result of Lemma 9.2 which gives P D ( λ ) = P î ( U p ) T ∆ U p ≥ λ (cid:0) τ + log (cid:12)(cid:12) I + λ − ∆ (cid:12)(cid:12)(cid:1) ó . (9.87)We now proceed to expand the term log (cid:12)(cid:12) I + λ − ∆ (cid:12)(cid:12) using a Taylor series expansion resultingin log (cid:12)(cid:12) I + λ − ∆ (cid:12)(cid:12) = p (cid:88) i =1 log (cid:0) λ − ( ∆ ) i,i (cid:1) (9.88)= p (cid:88) i =1 Ñ ∞ (cid:88) j =1 (cid:32) (cid:0) λ − ( ∆ ) i,i (cid:1) j − j − − (cid:0) λ − ( ∆ ) i,i (cid:1) j j (cid:33) é . (9.89)Because ( ∆ ) i,i ≤ , for i = 1 , . . . , p , and λ ≥
1, then (cid:0) λ − ( ∆ ) i,i (cid:1) j − j − − (cid:0) λ − ( ∆ ) i,i (cid:1) j j ≥ , for j ∈ Z + . (9.90)Thus, (9.89) is lower bounded by the second order Taylor expansion, i.e.,log | I + ∆ | ≥ p (cid:88) i =1 Ç λ − ( ∆ ) i,i − (cid:0) λ − ( ∆ ) i,i (cid:1) å (9.91)= 1 λ tr( ∆ ) − λ tr( ∆ ) . (9.92)Substituting (9.92) in (9.87) yields P D ( λ ) ≤ P ï ( U p ) T ∆ U p ≥ tr( ∆ ) + 2 λ log τ − λ tr( ∆ ) ò . (9.93)22ote that E (cid:2) ( U p ) T ∆ U p (cid:3) = tr( ∆ ), and therefore, evaluating the probability in (9.93) isequivalent to evaluating the probability of ( U p ) T ∆ U p deviating 2 λ log τ − λ tr( ∆ ) fromthe mean. In view of this, the right-hand side in (9.93) is upper bounded by [17, 18] P D ( λ ) ≤ P (cid:104) ( U p ) T ∆ U p ≥ tr( ∆ ) + 2 » tr( ∆ ) t + 2 || ∆ || ∞ t (cid:105) ≤ e − t , (9.94)for t > λ log τ − λ tr( ∆ ) ≥ » tr( ∆ ) t + 2 || ∆ || ∞ t. (9.95)The expression in (9.95) is satisfied with equality for two values of λ , one is strictly negativeand the other one is strictly positive denoted by λ (cid:63) ( t ), when τ >
1. The result follows bynoticing that the left-hand term of (9.95) increases monotonically for λ > λ ≥ max ( λ (cid:63) ( t ) , λ the probability of detection decreasesexponentially fast with λ . We will later show in the numerical results that the regime inwhich the exponentially fast decrease kicks in does not align with the saturation of themutual information loss induced by the attack. We evaluate the performance of stealth attacks in practical state estimation settings. nparticular, the IEEE 14-Bus, 30-Bus and 118-Bus test systems are considered in the simu-lation. In state estimation with linearized dynamics, the Jacobian measurement matrix isdetermined by the operation point. We assume a DC state estimation scenario [19, 20], andthus, we set the resistances of the branches to 0 and the bus voltage magnitude to 1 . χ random variables, which is required to calculatethe probability of detection of the generalized stealth attacks as shown in Lemma 9.2. Forthat reason, we use the Lindsay–Pilla–Basak method and the MOMENTCHI2 package [22]to numerically evaluate the probability of attack detection.The covariance matrix of the state variables is modelled as a Toeplitz matrix withexponential decay parameter ρ , where the exponential decay parameter ρ determines thecorrelation strength between different entries of the state variable vector. The performanceof the generalized stealth attack is a function of weight given to the detection term in theattack construction cost function, i.e. λ , the correlation strength between state variables,23 -20020406080100120140 -0.200.20.40.60.81 P r obab ili t y o f D e t e c t i on Figure 9.1: Performance of the generalized stealth attack in terms of mutual informationand probability of detection for different values of λ and system size when ρ = 0 . ρ = 0 . τ = 2. -20020406080100120140 -0.200.20.40.60.81 P r obab ili t y o f D e t e c t i on Figure 9.2: Performance of the generalized stealth attack in terms of mutual informationand probability of detection for different values of λ and system size when ρ = 0 . ρ = 0 . τ = 2.i.e. ρ , and the Signal-to-Noise Ratio (SNR) of the power system which is defined asSNR ∆ = 10 log Ç tr( HΣ XX H T ) mσ å . (9.96)Fig. 9.1 and Fig. 9.2 depict the performance of the optimal attack construction fordifferent values of λ and ρ with SNR = 10 dB and SNR = 20 dB, respectively, when τ = 2.24
10 15 20 25 30 35 40 45 50 λ -12 -10 -8 -6 -4 -2 P r obab ili t y o f D e t e c t i on PD,30 Bus, ρ =0.1Upper Bound,30 Bus, ρ =0.1PD,30 Bus, ρ =0.9Upper Bound,30 Bus, ρ =0.9PD,118 Bus, ρ =0.1Upper Bound,118 Bus, ρ =0.1PD,118 Bus, ρ =0.9Upper Bound,118 Bus, ρ =0.9 Figure 9.3: Upper bound on probability of detection given in Theorem 9.6 for differentvalues of λ when ρ = 0 . .
9, SNR = 10 dB, and τ = 2. λ -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 P r obab ili t y o f D e t e c t i on PD,30 Bus, ρ =0.1Upper Bound,30 Bus, ρ =0.1PD,30 Bus, ρ =0.9Upper Bound,30 Bus, ρ =0.9PD,118 Bus, ρ =0.1Upper Bound,118 Bus, ρ =0.1PD,118 Bus, ρ =0.9Upper Bound,118 Bus, ρ =0.9 Figure 9.4: Upper bound on probability of detection given in Theorem 9.6 for differentvalues of λ when ρ = 0 . .
9, SNR = 20 dB, and τ = 2.As expected, larger values of the parameter λ yield smaller values of the probability ofattack detection while increasing the mutual information between the state variables vectorand the compromised measurement vector. We observe that the probability of detectiondecreases approximately linearly for moderate values of λ . On the other hand, Theorem 9.6states that for large values of λ the probability of detection decreases exponentially fast to25ero. However, for the range of values of λ in which the decrease of probability of detectionis approximately linear, there is no significant reduction on the rate of growth of mutualinformation. In view of this, the attacker needs to choose the value of λ carefully as theconvergence of the mutual information to the asymptote I ( X n ; Y m ) is slower than that ofthe probability of detection to zero.The comparison between the 30-Bus and 118-Bus systems shows that for the smallersize system the probability of detection decreases faster to zero while the rate of growthof mutual information is smaller than that on the larger system. This suggests that thechoice of λ is particularly critical in large size systems as smaller size systems exhibit a morerobust attack performance for different values of λ . The effect of the correlation betweenthe state variables is significantly more noticeable for the 118-bus system. While there is aperformance gain for the 30-bus system in terms of both mutual information and probabilityof detection due to the high correlation between the state variables, the improvement is morenoteworthy for the 118-bus case. Remarkably, the difference in terms of mutual informationbetween the case in which ρ = 0 . ρ = 0 . λ increases which indicates thatthe cost in terms of mutual information of reducing the probability of detection is large inthe small values of correlation.The performance of the upper bound given by Theorem 9.6 on the probability of detec-tion for different values of λ and ρ when τ = 2 and SNR = 10 dB is shown in Fig. 9.3. Sim-ilarly, Fig. 9.4 depicts the upper bound with the same parameters but with SNR = 20 dB.As shown by Theorem 9.6 the bound decreases exponentially fast for large values of λ . Still,there is a significant gap to the probability of attack detection evaluated numerically. Thisis partially due to the fact that our bound is based on the concentration inequality in [17]which introduces a gap of more than an order of magnitude. Interestingly, the gap decreaseswhen the value of ρ increases although the change is not significant. More importantly, thebound is tighter for lower values of SNR for both 30-bus and 118-bus systems. The stealth attack construction proposed in the preceding text requires perfect knowledgeof the covariance matrix of the state variables and the linearized Jacobian measurementmatrix. In [23], the performance of the attack when the second-order statistics are notperfectly known by the attacker but the linearized Jacobian measurement matrix is known.Therein, the partial knowledge is modelled by assuming that the attacker has access to asample covariance matrix of the state variables. Specifically, the training data consistingof k state variable realizations { x ni } ki =1 is available to the attacker. That being the casethe attacker computes the unbiased estimate of the covariance matrix of the state variablesgiven by 26 XX = 1 k − k (cid:88) i =1 x ni ( x ni ) T . (9.97)The stealth attack constructed using the sample covariance matrix follows a multivariateGaussian distribution given by ˜ A m ∼ N ( , Σ ˜ A ˜ A ) , (9.98)where Σ ˜ A ˜ A = HS XX H T .Because the sample covariance matrix in (9.97) is a random matrix with central Wishartdistribution given by S XX ∼ k − W n ( k − , Σ XX ) , (9.99)the ergodic counterpart of the cost function in (9.65) is defined in terms of the conditionalKL divergence given by E S XX î D Ä P X n Y mA | S XX (cid:107) P X n P Y m äó . (9.100)The ergodic cost function characterizes the expected performance of the attack averagedover the realizations of training data. Note that the performance using the sample covariancematrix is suboptimal [11] and that the ergodic performance converges asymptotically to theoptimal attack construction when the size of the training data set increases. In this section, we analytically characterize the ergodic attack performance defined in (9.100)by providing an upper bound using random matrix theory tools. Before introducing theupper bound, some auxiliary results on the expected value of the extreme eigenvalues ofWishart random matrices are presented below.
Auxiliary Results in Random Matrix TheoryLemma 9.3.
Let Z l be an ( k − × l matrix whose entries are independent standard normalrandom variables, then var ( s max ( Z l )) ≤ , (9.101) where var ( · ) denotes the variance and s max ( Z l ) is the maximum singular value of Z l .Proof. Note that s max ( Z l ) is a 1-Lipschitz function of matrix Z l , the maximum singularvalue of Z l is concentrated around the mean [24, Proposition 5.34] given by E [ s max ( Z l )].Then for t ≥
0, it holds that P [ | s max ( Z l ) − E [ s max ( Z l )] | > t ] ≤ {− t / } (9.102) ≤ exp { − t / } . (9.103)Therefore s max ( Z l ) is a sub-gaussian random variable with variance proxy σ p ≤
1. Thelemma follows from the fact that var ( s max ( Z l )) ≤ σ p .27 emma 9.4. Let W l denote a central Wishart matrix distributed as k − W l ( k − , I ) , thenthe non-asymptotic expected value of the extreme eigenvalues of W l is bounded by (cid:16) − » l/ ( k − (cid:17) ≤ E [ λ min ( W l )] (9.104) and E [ λ max ( W l )] ≤ (cid:16) » l/ ( k − (cid:17) + 1 / ( k − , (9.105) where λ min ( W l ) and λ max ( W l ) denote the minimum eigenvalue and maximum eigenvalueof W l , respectively.Proof. Note that [24, Theorem 5.32] √ k − − √ l ≤ E [ s min ( Z l )] (9.106)and √ k − √ l ≥ E [ s max ( Z l )] , (9.107)where s min ( Z l ) is the minimum singular value of Z l . Given the fact that W l = k − Z T l Z l ,then it holds that E [ λ min ( W l )] = E î s min ( Z l ) ó k − ≥ E [ s min ( Z l )] k − E [ λ max ( W l )] = E î s max ( Z l ) ó k − ≤ E [ s max ( Z l )] + 1 k − , (9.109)where (9.109) follows from Lemma 9.3. Combining (9.106) with (9.108), and (9.107) with(9.109), respectively, yields the lemma.Recall the cost function describing the attack performance given in (9.100) can be writtenin terms of the covariance matrix Σ ˜ A ˜ A in the multivariate Gaussian case with imperfectsecond-order statistics. The ergodic cost function that results from averaging the cost overthe training data yields E S XX î D Ä P X n Y mA | S XX (cid:107) P X n P Y m äó = 12 E (cid:2) tr( Σ − YY Σ ˜ A ˜ A ) − log | Σ ˜ A ˜ A + σ I |− log | Σ − YY | (cid:3) (9.110)= 12 (cid:16) tr (cid:0) Σ − YY Σ (cid:63)AA (cid:1) − log (cid:12)(cid:12) Σ − YY (cid:12)(cid:12) − E (cid:2) log | Σ ˜ A ˜ A + σ I | (cid:3)(cid:17) . (9.111)The assessment of the ergodic attack performance boils down to evaluating the last termin (9.110). Closed form expressions for this term are provided in [25] for the same caseconsidered in this paper. However, the resulting expressions are involved and are onlycomputable for small dimensional settings. For systems with a large number of dimensionsthe expressions are computationally prohibitive. To circumvent this challenge we proposea lower bound on the term that yields an upper bound on the ergodic attack performance.Before presenting the main result we provide the following auxiliary convex optimizationresult. 28 emma 9.5. Let W p denote a central Wishart matrix distributed as k − W p ( k − , I ) andlet B = diag( b , . . . , b p ) denote a positive definite diagonal matrix. Then E (cid:2) log (cid:12)(cid:12) B + W − p (cid:12)(cid:12)(cid:3) ≥ p (cid:88) i =1 log ( b i + 1 /x (cid:63)i ) , (9.112) where x (cid:63)i is the solution to the convex optimization problem given by min { x i } pi =1 p (cid:88) i =1 log ( b i + 1 /x i ) (9.113) s.t. p (cid:88) i =1 x i = p (9.114)max ( x i ) ≤ (cid:16) » p/ ( k − (cid:17) + 1 / ( k −
1) (9.115)min ( x i ) ≥ (cid:16) − » p/ ( k − (cid:17) . (9.116) Proof.
Note that E (cid:2) log (cid:12)(cid:12) B + W − p (cid:12)(cid:12)(cid:3) = p (cid:88) i =1 E ï log Å b i + 1 λ i ( W p ) ãò (9.117) ≥ p (cid:88) i =1 log Å b i + 1 E [ λ i ( W p )] ã , (9.118)where in (9.117), λ i ( W p ) is the i -th eigenvalue of W p in decreasing order; (9.118) followsfrom Jensen’s inequality due to the convexity of log (cid:0) b i + x (cid:1) for x >
0. Constraint (9.114)follows from the fact that E [trace( W p )] = p , and constraints (9.115) and (9.116) followfrom Lemma 9.4. This completes the proof. Upper Bound on the Ergodic Stealth Attack Performance
The following theorem provides a lower bound for the last term in (9.110), and therefore,it enables us to upper bound the ergodic stealth attack performance.
Theorem 9.7.
Let Σ ˜ A ˜ A = HS XX H T with S XX distributed as k − W n ( k − , Σ XX ) anddenote by Λ p = diag( λ , . . . , λ p ) the diagonal matrix containing the nonzero eigenvalues indecreasing order. Then E (cid:2) log | Σ ˜ A ˜ A + σ I | (cid:3) ≥ (cid:32) p − (cid:88) i =0 ψ ( k − − i ) (cid:33) − p log( k −
1) + p (cid:88) i =1 log Å λ i σ + 1 λ (cid:63)i ã + 2 m log σ, (9.119) where ψ ( · ) is the Euler digamma function, p = rank( HΣ XX H T ) , and { λ (cid:63)i } pi =1 is the solutionto the optimization problem given by (9.113) - (9.116) with b i = λ i σ , for i = 1 , . . . , p . roof. We proceed by noticing that E (cid:2) log | Σ ˜ A ˜ A + σ I | (cid:3) = E ï log (cid:12)(cid:12)(cid:12)(cid:12) k − σ Z T m ΛZ m + I (cid:12)(cid:12)(cid:12)(cid:12) ò + 2 m log σ (9.120)= E ñ log (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Λ p σ Z T p Z p k − I (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ô + 2 m log σ (9.121)= E (cid:34) log (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Z T p Z p k − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) +log (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Λ p σ + Ç Z T p Z p k − å − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:35) +2 m log σ (9.122) ≥ (cid:32) p − (cid:88) i =0 ψ ( k − − i ) (cid:33) − p log( k −
1) + p (cid:88) i =1 log Å λ i σ + 1 λ (cid:63)i ã + 2 m log σ, (9.123)where in (9.120), Λ is a diagonal matrix containing the eigenvalues of HΣ XX H T in decreas-ing order; (9.121) follows from the fact that p = rank( HΣ XX H T ); (9.123) follows from [26,Theorem 2.11] and Lemma 9.5. This completes the proof. Theorem 9.8.
The ergodic attack performance given in (9.110) is upper bounded by E (cid:2) f ( Σ ˜ A ˜ A ) (cid:3) ≤ (cid:32) trace (cid:0) Σ − YY Σ (cid:63)AA (cid:1) − log (cid:12)(cid:12) Σ − YY (cid:12)(cid:12) − m log σ (9.124) − Å p − (cid:88) i =0 ψ ( k − − i ) ã + p log( k −
1) (9.125) − p (cid:88) i =1 log Å λ i σ + 1 λ (cid:63)i ã (cid:33) . (9.126) Proof.
The proof follows immediately from combing Theorem 9.7 with (9.110).Fig.9.5 depicts the upper bound in Theorem 9.8 as a function of number of samples for ρ = 0 . ρ = 0 . We have cast the state estimation problem in a Bayesian setting and shown that the attackercan construct data-injection attacks that exploit prior knowledge about the state variables.In particular, we have focused in multivariate Gaussian random processes to describe thestate variables and proposed two attack construction strategies: determinis- tic attacks andrandom attacks. 30 number of samples u t ili t y f un c t i on v a l ue optimal utility function value for ρ = 0.1Monte Carlo utility function value for ρ = 0.1upper bound for utility function value for ρ = 0.1optimal utility function value for ρ = 0.8Monte Carlo utility function value for ρ = 0.8upper bound for utility function value for ρ = 0.8 Figure 9.5: Performance of the upper bound in Theorem 9.8 as a function of number ofsample for ρ = 0 . ρ = 0 . ibliography [1] Y. Liu, P. Ning, and M. K. Reiter, “False data injection attacks against state estimationin electric power grids,” in Proc. ACM Conf. on Computer and CommunicationsSecurity , Chicago, IL, USA, Nov. 2009, pp. 21–32.[2] O. Kosut, L. Jia, R. J. Thomas, and L. Tong, “Malicious data attacks on the smartgrid,”
IEEE Trans. Smart Grid , vol. 2, no. 4, pp. 645–658, Dec. 2011.[3] I. Esnaola, S. M. Perlaza, H. V. Poor, and O. Kosut, “Maximum distortion attacks inelectricity grids,”
IEEE Trans. Smart Grid , vol. 7, no. 4, pp. 2007–2015, Jul. 2016.[4] H. V. Poor,
An Introduction to Signal Detection and Estimation , 2nd ed. New York:Springer-Verlag, 1994.[5] I. Esnaola, S. M. Perlaza, H. V. Poor, and O. Kosut, “Decentralized maximum distor-tion mmse attacks in electricity grids,”
INRIA, Lyon, Tech. Rep. 466 , Sep. 2015.[6] J. F. Nash, “Equilibrium points in n-person games,”
Proc. National Academy ofSciences of the United States of America , vol. 36, no. 1, pp. 48–49, Jan. 1950.[7] D. Monderer and L. S. Shapley, “Potential games,”
Games and Economic Behavior ,vol. 14, no. 1, pp. 124–143, May 1996.[8] I. Shomorony and A. S. Avestimehr, “Worst-case additive noise in wireless networks,”
IEEE Trans. Inf. Theory , vol. 59, no. 6, pp. 3833–3847, Jun. 2013.[9] J. Neyman and E. S. Pearson, “On the problem of the most efficient tests of statisticalhypotheses,” in
Breakthroughs in Statistics , Springer Series in Statistics, pp. 73–108.Springer New York, 1992.[10] T. M. Cover and J. A. Thomas,
Elements of Information Theory , John Wiley & Sons,Nov. 2012.[11] K. Sun, I. Esnaola, S.M. Perlaza, and H.V. Poor, “Information-theoretic attacks inthe smart grid,” in
Proc. IEEE Int. Conf. on Smart Grid Comm. , Dresden, Germany,Oct. 2017, pp. 455–460. 3312] K. Sun, I. Esnaola, S.M. Perlaza, and H.V. Poor, “Stealth attacks on the smart grid,”
IEEE Trans. Smart Grid , vol. 11, no. 2, pp. 1276–1285, Mar. 2020.[13] J. Hou and G. Kramer, “Effective secrecy: Reliability, confusion and stealth,” in
Proc.IEEE Int. Symp. on Information Theory , Honolulu, HI, USA, Jun. 2014, pp. 601–605.[14] S. Boyd and L. Vandenberghe,
Convex Optimization , Cambridge University Press,Mar. 2004.[15] D. A. Bodenham and N. M. Adams, “A comparison of efficient approximations fora weighted sum of chi-squared random variables,”
Stat Comput , vol. 26, no. 4, pp.917–928, Jul. 2016.[16] Bruce G. Lindsay, Ramani S. Pilla, and Prasanta Basak, “Moment-based approxi-mations of distributions using mixtures: Theory and applications,”
Ann. Inst. Stat.Math. , vol. 52, no. 2, pp. 215–230, Jun. 2000.[17] B. Laurent and P. Massart, “Adaptive estimation of a quadratic functional by modelselection,”
Ann. Statist. , vol. 28, no. 5, pp. 1302–1338, 2000.[18] D. Hsu, S.M. Kakade, and T. Zhang, “A tail inequality for quadratic forms of sub-gaussian random vectors,”
Electron. Commun. in Probab. , vol. 17, no. 52, pp. 1–6,2012.[19] A. Abur and A. G. Exp´osito,
Power System State Estimation: Theory and Implemen-tation , CRC Press, Mar. 2004.[20] J. J. Grainger and W. D. Stevenson,
Power System Analysis , McGraw-Hill, 1994.[21] R. D. Zimmerman, C. E. Murillo-S´anchez, and R. J. Thomas, “MATPOWER: Steady-state operations, planning, and analysis tools for power systems research and educa-tion,”
IEEE Trans. Power Syst. , vol. 26, no. 1, pp. 12–19, Feb. 2011.[22] D. Bodenham,
Momentchi2: Moment-Matching Methods for Weighted Sumsof Chi-Squared Random Variables . (2016) [Online]. Available: https://cran.r-project.org/web/packages/momentchi2/index.html.[23] K. Sun, I. Esnaola, A. M. Tulino, and H. V. Poor, “Learning requirements for stealthattacks,” in
Proc. IEEE Int. Conf. on Acoust., Speech and Signal Process. , Brighton,United Kingdom, 2019, pp. 8102–8106.[24] R. Vershynin, “Introduction to the non-asymptotic analysis of random matrices,”in
Compressed Sensing: Theory and Applications , Y. Eldar and G. Kutyniok, Eds.,chapter 5, pp. 210–268. Cambridge University Press, Cambridge, UK, 2012.[25] G. Alfano, A. M. Tulino, A. Lozano, and S. Verd´u, “Capacity of MIMO channels withone-sided correlation,” in
Proc. of IEEE 8th Int. Symp. on Spread Spectrum Techniquesand Applications , Sydney, Australia, Aug 2004.3426] A. M. Tulino and S. Verd´u,