10
IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. XXX, NO. XXX, XXX 2012 1 Markov Game Analysis for Attack-Defense of Power Networks under Possible Misinformation Chris Y. T. Ma, David K. Y. Yau, Xin Lou, and Nageswara S. V. Rao Abstract—Electricity grids are critical infrastructures. They are credible targets of active (e.g., terrorist) attacks since their disruption may lead to sizable losses economically and in human lives. It is thus crucial to develop decision support that can guide administrators in deploying defense resources for system security and reliability. Prior work on the defense of critical infrastructures has typically used static or Stackelberg games. These approaches view network interdictions as one-time events. However, infrastructure protection is also a continual process in which the defender and attacker interact to produce dynamic states affecting their best actions, as witnessed in the continual attack and defense of transmission networks in Colombia and Yemen. In this paper, we use zero-sum Markov games to model these interactions subject to underlying uncertainties of real- world events and actions. We solve equilibrium mixed strategies of the players that maximize their respective minimum payoffs with a time-decayed metric. We also show how the defender can use deception as a defense mechanism. Using results for a 5-bus system [1], a WECC 9-bus system [2], and an IEEE standard 14-bus system [3], we illustrate that our game model can provide useful insights. We also contrast our results with those of static games, and quantify the gain in defender payoff due to misinformation of the attacker. Index Terms—Power system security, smart grid communica- tion networks, Markov games. I. I NTRODUCTION Electricity networks are among the most critical infrastruc- tures, and their disruptions can have severe economic, social, and security consequences. Disruption of trading systems may lead to turmoils in financial markets. Lives may be endangered if power needed for heating is lost during cold weather, or it is lost during life saving procedures in hospitals. Loss of power may also disrupt communications, stall work, cripple transportation, and/or lead to other major failures that can potentially bring entire nations to a standstill. Because of their importance, power networks are credible targets for active (e.g., terrorist) attacks. On the other hand, protecting these networks is extremely challenging, due to their expansive geographical spans and complex interdependencies between system components. For example, transmission lines may run This research was supported in part by Singapore’s Agency for Science, Technology and Research (A*STAR) under the Human Sixth Sense Pro- gramme, in part by U.S. National Science Foundation under grant numbers CNS-0963715 and CNS-0964086, in part by U.S. Department of Energy, Office of Advanced Computing Research under the Mathematics of Complex, Distributed, Interconnected Systems program, and in part by National Natural Science Foundation of China (NSFC) under grant number 61028007. Oak Ridge National Laboratory is managed by UT-Battelle, LLC for U.S. Depart- ment of Energy under Contract No. DE-AC05-00OR22725. C. Y. T. Ma is with Advanced Digital Sciences Center, Singapore. D. K. Y. Yau is with Purdue University, West Lafayette, IN, USA and Advanced Digital Sciences Center, Singapore. X. Lou is with City University of Hong Kong, Hong Kong and Advanced Digital Sciences Center, Singapore. N. S. V. Rao is with Oak Ridge National Laboratory, TN, USA Manuscript received xxx, 20xx; revised xxx, 20xx. for miles in the open, and the system must maintain stable and prespecified power quality (e.g., frequency, voltage, and phase synchronization) for performance and safety of equipment. To protect critical infrastructures from attacks, adminis- trators need tools that support prudent decision making. In particular, administrators need to make informed decisions about where to deploy finite resources to harden the system for maximum resiliency against adversaries. Such guidance for infrastructure protection has been obtained using Markov decision processes (MDP) or game theory. In MDP [4], the system is modeled as a set of states with Markov transitions between them. The problem is to optimize the actions of a “player” (e.g., the defender) under the probabilistic outcomes of these actions. The solution optimizes the actions of a single player only. It is suitable for a defender to maximize system reliability against passive disruptors of known probabilistic behavior. Collectively, these disruptors may represent “nature,” which may disrupt components through indeliberate events such as bad weather or normal wear-and-tear. Game theoretic approaches for infrastructure protection, on the other hand, postulate a strong and informed attacker – one capable of devising its own best counter strategies against the defender. In a static game [5], both players choose their moves simultaneously. In another form of leader-follower Stackelberg games [6], the optimization of the players’ strategies is a bilevel problem. At the inner level, the follower maximizes its payoff given a leader’s strategy. At the outer level, the leader chooses a strategy S to maximize its own payoff subject to the follower’s solution of the inner problem defined by S. Notice that a strong attacker model may also be applicable for play against nature, in which case the analysis gives worst-case estimates of required protection for reliability. The above kinds of games view network interdictions as one-time events. However, infrastructure protection is also a continual process in which the players continue to inter- act to produce dynamic states that reflect their respective best actions. For instance, serial attacks by rebel separatists targeting the power transmission networks in Colombia in the past two decades (including 11 attacks on the national transmission system on a single night and the cumulative loss of roughly 2740 towers since 1999) have resulted in a number of countermeasures by system administrators, including the setup of backup control centers, better personnel training, improved communication, and deployment of a sensor net- work for protection [7]. Recently, the transmission network in Yemen has also experienced repeated attacks by rival political forces (power lines retargeted only minutes after engineers managed to fix a prior sabotage attack), which led to a lengthy blackout and severe disruptions within Yemen’s industries [8]. Markov games model these interactions subject to inherent uncertainty in the underlying physical system. They can be

IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. XXX, NO. XXX, …david_yau/publications/ma... · 2018. 4. 9. · IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. XXX, NO. XXX, XXX 2012 2 viewed

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. XXX, NO. XXX, …david_yau/publications/ma... · 2018. 4. 9. · IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. XXX, NO. XXX, XXX 2012 2 viewed

IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. XXX, NO. XXX, XXX 2012 1

Markov Game Analysis for Attack-Defense ofPower Networks under Possible Misinformation

Chris Y. T. Ma, David K. Y. Yau, Xin Lou, and Nageswara S. V. Rao

Abstract—Electricity grids are critical infrastructures. Theyare credible targets of active (e.g., terrorist) attacks since theirdisruption may lead to sizable losses economically and in humanlives. It is thus crucial to develop decision support that canguide administrators in deploying defense resources for systemsecurity and reliability. Prior work on the defense of criticalinfrastructures has typically used static or Stackelberg games.These approaches view network interdictions as one-time events.However, infrastructure protection is also a continual process inwhich the defender and attacker interact to produce dynamicstates affecting their best actions, as witnessed in the continualattack and defense of transmission networks in Colombia andYemen. In this paper, we use zero-sum Markov games to modelthese interactions subject to underlying uncertainties of real-world events and actions. We solve equilibrium mixed strategiesof the players that maximize their respective minimum payoffswith a time-decayed metric. We also show how the defendercan use deception as a defense mechanism. Using results fora 5-bus system [1], a WECC 9-bus system [2], and an IEEEstandard 14-bus system [3], we illustrate that our game modelcan provide useful insights. We also contrast our results withthose of static games, and quantify the gain in defender payoffdue to misinformation of the attacker.

Index Terms—Power system security, smart grid communica-tion networks, Markov games.

I. I NTRODUCTION

Electricity networks are among the most critical infrastruc-tures, and their disruptions can have severe economic, social,and security consequences. Disruption of trading systems maylead to turmoils in financial markets. Lives may be endangeredif power needed for heating is lost during cold weather, orit is lost during life saving procedures in hospitals. Loss ofpower may also disrupt communications, stall work, crippletransportation, and/or lead to other major failures that canpotentially bring entire nations to a standstill. Because of theirimportance, power networks are credible targets for active(e.g., terrorist) attacks. On the other hand, protecting thesenetworks is extremely challenging, due to their expansivegeographical spans and complex interdependencies betweensystem components. For example, transmission lines may run

This research was supported in part by Singapore’s Agency for Science,Technology and Research (A*STAR) under the Human Sixth SensePro-gramme, in part by U.S. National Science Foundation under grant numbersCNS-0963715 and CNS-0964086, in part by U.S. Department of Energy,Office of Advanced Computing Research under the Mathematics ofComplex,Distributed, Interconnected Systems program, and in part by National NaturalScience Foundation of China (NSFC) under grant number 61028007. OakRidge National Laboratory is managed by UT-Battelle, LLC forU.S. Depart-ment of Energy under Contract No. DE-AC05-00OR22725.

C. Y. T. Ma is with Advanced Digital Sciences Center, Singapore.D. K. Y. Yau is with Purdue University, West Lafayette, IN, USA and

Advanced Digital Sciences Center, Singapore.X. Lou is with City University of Hong Kong, Hong Kong and Advanced

Digital Sciences Center, Singapore.N. S. V. Rao is with Oak Ridge National Laboratory, TN, USAManuscript received xxx, 20xx; revised xxx, 20xx.

for miles in the open, and the system must maintain stable andprespecified power quality (e.g., frequency, voltage, and phasesynchronization) for performance and safety of equipment.

To protect critical infrastructures from attacks, adminis-trators need tools that support prudent decision making. Inparticular, administrators need to make informed decisionsabout where to deploy finite resources to harden the systemfor maximum resiliency against adversaries. Such guidancefor infrastructure protection has been obtained using Markovdecision processes (MDP) or game theory. In MDP [4], thesystem is modeled as a set of states with Markov transitionsbetween them. The problem is to optimize the actions of a“player” (e.g., the defender) under the probabilistic outcomesof these actions. The solution optimizes the actions of a singleplayer only. It is suitable for a defender to maximize systemreliability against passive disruptors of known probabilisticbehavior. Collectively, these disruptors may represent “nature,”which may disrupt components through indeliberate eventssuch as bad weather or normal wear-and-tear.

Game theoretic approaches for infrastructure protection,onthe other hand, postulate a strong and informed attacker – onecapable of devising its own best counter strategies againstthedefender. In astatic game [5], both players choose their movessimultaneously. In another form of leader-follower Stackelberggames [6], the optimization of the players’ strategies is abilevel problem. At the inner level, the follower maximizesitspayoff given a leader’s strategy. At the outer level, the leaderchooses a strategyS to maximize its own payoff subject to thefollower’s solution of the inner problem defined byS. Noticethat a strong attacker model may also be applicable for playagainst nature, in which case the analysis gives worst-caseestimates of required protection for reliability.

The above kinds of games view network interdictions asone-time events. However, infrastructure protection is alsoa continual process in which the players continue to inter-act to produce dynamic states that reflect their respectivebest actions. For instance, serial attacks by rebel separatiststargeting the power transmission networks in Colombia inthe past two decades (including 11 attacks on the nationaltransmission system on a single night and the cumulative lossof roughly 2740 towers since 1999) have resulted in a numberof countermeasures by system administrators, including thesetup of backup control centers, better personnel training,improved communication, and deployment of a sensor net-work for protection [7]. Recently, the transmission network inYemen has also experienced repeated attacks by rival politicalforces (power lines retargeted only minutes after engineersmanaged to fix a prior sabotage attack), which led to a lengthyblackout and severe disruptions within Yemen’s industries[8].Markov games model these interactions subject to inherentuncertainty in the underlying physical system. They can be

Page 2: IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. XXX, NO. XXX, …david_yau/publications/ma... · 2018. 4. 9. · IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. XXX, NO. XXX, XXX 2012 2 viewed

IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. XXX, NO. XXX, XXX 2012 2

viewed as generalizations of MDP to an adversarial setting.For the protection of power networks, we assume that theattacker deploys resources to disrupt transmission lines in apower grid,1 and its goal is to maximize the amount of loadshedding. The defender’s goal, on the other hand, is to deploydefense resources to minimize the amount of load sheddingin the face of such attacks. The directly opposing goals of theattacker and defender lead to a zero-sum game formulationnaturally.

Measuring the damage as the amount of shed load is anatural baseline. In addition, load in practical systems maybe of varying importance. For example, part of a power net-work may serve critical government functions predominantly,whereas another part may serve charging stations for plugged-in hybrid electric vehicles (PHEVs) predominantly. Protectionof the former is arguably more important than the latter.Hence, we will also consider a more general model in whichshedding load in different parts of the grid does differentdamage quantified bycost functions. In this case, the attacker’sand defender’s goals are to maximize and minimize the totalcost of shed load, respectively.

Game theoretic analysis has typically assumed a full infor-mation setting. From the defender’s point of view, the assump-tion rules out “security by obscurity” and leads to conservativedefense measures against a worst possible attacker. Such worstcase analysis is valuable, but at the same time, informationcan be considered an asset that carries cost and may havesignificant effects on achieved payoffs in practical situations.For example, control and sensing information communicatedin future smart grids may be a valuable source of informationfor would-be attackers. Advanced meter infrastructures (AMIs)may give a comprehensive view of the distribution of load andresulting power flows. More sophisticated attackers may eveninfer the types of load based on their power signatures [10],leading to knowledge about the cost functions in our model.Explicit consideration for the role of on-line informationgives a cyber dimension of smart grid protection as acyber-physical system problem. In this paper, we will evaluate howthe defender may employ deception as a defense strategy toimprove its payoff.

Our contributions are as follows. (i) We model the attack-defense of power networks as a Markov game. We solveequilibrium mixed strategies of the players that maximize theirrespective minimum payoffs by a time-decayed metric underuncertainty. (ii) We show that after our algorithm converges,the solution in each state is equivalent to that of a static gamewith a composite payoff matrix. Analysis of this compositematrix simplifies the interpretation of results obtained byouralgorithm. (iii) We apply our solution to three realistic powersystems. We contrast our numerical results with those of staticgames, and show that their analysis leads to useful insightsinsometimes subtle situations. (iv) We discuss how deceptioncan be used as a defense strategy, and quantify the gain indefender payoff due to such deception.

The rest of this paper is organized as follows. Section II dis-

1Transmission lines can be considered particularly vulnerable targets dueto the impossibility of fully protecting their vast, open spans; for example,they are the targets in the continual attacks in Colombia and Yemen.Alternatively, the attacker could target power system equipment which can bequite expensive to repair or replace, although we do not study these attacksin this paper.

cusses related work. We define our system model in Section IIIand solve optimal load shedding as a component problemin Section IV. We solve the Markov game in Section V,under possible misinformation of the attacker. We presentand analyze numerical results for various power systems inSection VI. Section VII concludes.

II. RELATED WORK

MDP has been used to analyze the security and vulnerabilityof urban infrastructures. Joneset al. [11] use it to analyzethe actions of an intruder into transportation facilities.Jhaetal. [4] use MDP to interpret attack graphs in communicationnetworks, so that a minimal set of security measures can bedetermined that will guarantee the safety of a system. Theirwork optimizes the actions of the defender against a passiveattacker, whose strategy is fixed and given.

Game theory has been widely used to analyze the securityof critical systems. The competition between a defender andan attacker in this context has been modeled as leader-followerStackelberg games [6], [12], [13], and static games [5], [14].These games analyze one move of each player only, and sothey treat network interdictions as one-time events. In practice,the defender may interact with the attacker in repeated playsthat evolve the system state dynamically. Alpcanet al. [15]model these repeated plays under uncertainty as a Markovgame. They use the game model to design an intrusion detec-tion system for a communication network [15], and comparetheir results with those obtained using static games.

Maximum load shedding in power networks by interdictingtransmission lines has been solved using bilevel mixed integernonlinear programming [16]. We solve optimal load sheddingas a component step in our analysis (Section IV). But sinceour focus is not on the load shedding problem itself, we usea more simple solution based on linearization of power phaseangle constraints.

The value of deception in critical infrastructure protectionhas been noted by [12], although they do not present ananalysis of this technique. In general, games with incompleteinformation has been analyzed [17], and it is possible forinformation to be learned through repeated plays [17]. Inparticular, recent work studying the use of secrecy and decep-tion to protect critical infrastructures includes [18], [19], [20].A Stackelberg game model is used in [18], [19]. Zhuanget al. use a multiple-period game to model the updates ofinformation after each period [20]. These results all treatthe critical infrastructures as abstract objects with genericvaluation and cost functions. Our work makes the protectionproblem concrete by considering maximum impact load shed-ding in power networks under attack-defense formulations.In our particular application, the defender has to choose thebest subset of “objects” (i.e., transmission lines) to protectunder given budget constraints. We do not yet model howa misguided attacker can learn the true information throughrepeated plays or other channels, but note that such learningcan be incorporated with definition of the appropriate learningchannels.

III. PROBLEM FORMULATION

In a power grid, generators supply electricity and loadsconsume it. They are attached to a set of buses – which we call

Page 3: IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. XXX, NO. XXX, …david_yau/publications/ma... · 2018. 4. 9. · IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. XXX, NO. XXX, XXX 2012 2 viewed

IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. XXX, NO. XXX, XXX 2012 3

generation and load buses, respectively – interconnected by anetwork of transmission lines of given capacities. Henceforth,we refer to transmission lines aslinks.

An attacker aims to disrupt the power network by bring-ing down one or more links, in order to cause maximum“disruption” of the load. A defender aims to minimize thisdisruption. It does so by reinforcing links that are up, andrepairing links that are down. Strictly speaking, the status of acompromised link can be a fine-grained concept – e.g., a linkcould be outaged without being physically damaged, in whichcase the link may be automatically or manually restored ratherquickly, or the link could be damaged to different degrees ofseverity. Our model takes a simplified binary up/down view ofthe links, in order not to detract from the main game-theoreticfeatures of the problem.

In a baseline case, disruption is measured simply as theamount of load (in power unit) that must be shed due to thelink failures. More generally, shedding different loads mayhave different adverse impacts which we call thecost. A costfunction for a load bus, sayl, is given by ul(x, y), whichspecifies the cost of reducing the load fromx to y (in powerunits) on l. In this case, disruption is measured as the totalcost of shed load due to the link failures.

We define a Markov game as follows. The state of thegame (S) refers to the set of links that are currently up (linksthat are not up are down) in the power network. The gameproceeds in discrete time steps. In each time step, the playerschoose a pair of actions which, together with underlyingprobabilistic physical events, may cause state transitions ina Markov manner.

We use the following notations throughout the paper.• Ap: Action set of playerp, where p = att, def , cor-

responding to the attacker and defender, respectively.For the attacker, the action (a ∈ Aatt) is the link thatit chooses to attack. We assume that the attacks arephysical actions targeting the chosen transmission link,such as damaging an associated transmission tower orsevering the link itself [7], [8]. For the defender, theaction (d ∈ Adef ) is the (down) link that it choosesto repair or the (up) link that it chooses to reinforce.Applicable reinforcements include heightened securityguarding the link in question, or pre-mobilizing personneland materials for repair in the event of an attack. Theplayers have limited budgets in that in each time step,the attacker (respectively defender) can choose a limitednumber of links to attack (respectively repair/reinforce)only.

• S: Set of game states, where each state is an enumerationof the status of the links in order. We use “u” and “d” todenote the up or down status, respectively.

• PD(A): Set of mixed strategies over the action setA,where each actiona ∈ A is assigned a probability withwhich it is executed.

• ppf : Probability for an up link to fail in a time step uponattack, when reinforced by the defender during that timestep.

• pupf : Probability for an up link to fail upon attack, whenit is not reinforced. We have0 ≤ ppf ≤ pupf ≤ 1.

• ppr: Probability for a down link to recover (i.e., becomeup) in a time step, when repaired by the defender and notfurther attacked during that time step.

• pupr: Probability for a down link to recover when it isnot repaired by the defender and not attacked. We have0 ≤ pupr ≤ ppr ≤ 1.

We assume that the attacker can attack a link that is alreadydown. Such an action will reduce the probability that thelink recovers. For example, if a down link is repaired by thedefender and further attacked by the attacker in a time step,then its probability of recovery isppr × (1 − pupf ). If thedown link is not repaired by the defender, then its probabilityof recovery under attack ispupr × (1 − pupf ).

Since the load shedding goals of the attacker and defenderare directly opposing, we have a zero-sum game. A pair ofplayer actions in a state will bring animmediate reward forthe players. For the attacker, this reward is the expected costof shed load due to the resulting probabilistic transitionstothe possible next states. The defender’s immediate reward isthe negative of this number. Further to the immediate reward,each possible state transition if realized will bring the game toa new state, where the game will carry on. A further immediatereward will be obtained in the new state with further new statetransitions, and so on. Hence, a pair of actions taken in a statewill accrue along-term reward in general.

Formally, defineR(s, a, d) as the expected immediate re-ward, i.e., the expected cost of immediate shed load, for theattacker when it takes actiona and the defender takes actiond in states. (Reward for the defender is the negative of thisnumber.) Further defineQ(s, a, d) as the expected long-termreward, i.e., the expected long-term cost of shed load, for theattacker when it takes actiona and the defender takes actiond in states. (Expected long-term reward for the defender isthe negative of this number.) For the attacker in the Markovgame, thevalue, interpreted as the expected optimal long-termcost of shed load, of states ∈ S is

Vatt(s) = maxπ∈PD(Aatt)

mind∈Adef

a∈Aatt

Q(s, a, d)πa, (1)

whereπa is the probability of actiona in the optimal mixedstrategyπ of the attacker. The expected long-term reward,quality, of actiona against actiond in states is

Q(s, a, d) = R(s, a, d) + γ∑

s′

T (s, a, d, s′)Vatt(s′), (2)

whereT (s, a, d, s′), T : S ×Aatt ×Adef × S → [0, 1], is theprobability of the state transition froms to s′ under the actionsa andd of the attacker and defender, respectively, andγ is adiscount factor satisfying0 ≤ γ < 1. The transition probabilityT (s, a, d, s′) is derived from the corresponding probabilitiesppf , pupf , ppr and pupr based on statess and s′ as wellas the actionsa and d. For example, at state{u, u, u, u, u},link 4 is both defended and attacked, then the probability totransit to {u, u, u, d, u} is given by ppf , i.e., the reinforcedlink fails under attack. Fig. 1 illustrates an excerpt of thestate transitions from state{u,u,u,u,u} for the 5-bus system(Fig. 4). γ gives the discount factor of future rewards on theoptimal decision. Small values ofγ emphasize near-term gainswhile large values emphasize future rewards.γ may also beinterpreted as the belief of possible future interactions held bythe players.

Similarly, thevalue of states ∈ S for the defender is

Vdef (s) = minπ∈PD(Adef )

maxa∈Aatt

d∈Adef

Q(s, a, d)πd. (3)

Page 4: IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. XXX, NO. XXX, …david_yau/publications/ma... · 2018. 4. 9. · IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. XXX, NO. XXX, XXX 2012 2 viewed

IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. XXX, NO. XXX, XXX 2012 4

{u,u,u,u,u}

{u,u,d,u,u}{d,u,u,u,u} {u,u,u,d,u}{u,d,u,u,u} {u,u,u,u,d}

Fig. 1. Excerpt of state transitions from state{u,u,u,u,u} for the 5-bussystem.

Notice that in general,Vatt(s) andVdef (s) computed fromEq. 1 and Eq. 3 are different. In particular,Vatt(s) ≤ Vdef (s),where Eq. 1 corresponds to the primal problem and Eq. 3 cor-responds to the dual problem. The inequality expresses weakduality relating the primal and dual problems in general [21,Section 5.4]. When the Markov game is zero-sum, however,strong duality applies [21, Section 5.4] and equality holdsdueto thestrong max-min property. Hence, we useV (s) to denotethe value of states ∈ S, andV (s) = Vatt(s) = Vdef (s). Theoptimal solutions computed individually by the two playersaretherefore best responses to each other and they are in Nashequilibrium. The equilibrium solutions are necessarily Pareto-optimal, because we cannot improve the payoff of one playerwithout hurting that of the other in a zero-sum game.

IV. OPTIMAL LOAD SHEDDING

In order to determine their payoffs in the Markov game, theplayers need to solve the physics of optimal load shedding inpower networks. We formulate the problem as a constrainedoptimization problem, under physical constraints of stablepower flows. We use superscript(·)T to denote transpose, and(·)g and (·)l to denote generation and load, respectively. Theblackboard bold letterR represents the set of real numbers,R

n denotes the set of real column vectors, andRm×n denotes

the set of realm×n matrices. The cursive symbols¹ andºrepresent component-wise inequalities between vectors.

Consider active power in a lossless system. Our formulationis similar to other formulations in the literature [16], exceptthat we additionally incorporate a per-bus cost function toquantify the importance of the load attached to each bus:

minz

l,θ

n−k∑

i=1

ui(−pli,−(pl

i + zli)), (4)

s.t. AT B sin(Aθ) − (p + z) = 0, (5)

pl ¹ pl + zl ¹ 0, (6)

0 ¹ pg + zg ¹ pg, (7)

−π/2 ¹ Aθ ¹ π/2, (8)

−c ¹ B sin(Aθ) ¹ c. (9)

wheren is the total number of buses in the power network ofwhich k are generation buses, andui(·, ·) is the cost functionof load shedding on busi. A ∈ R

m×n gives the incidencematrix for the topology of the power network. In the incidencematrix, each row and column correspond to a link and busof the grid, respectively.B ∈ R

m×m is a diagonal matrixeach of whose diagonal elements represents the susceptanceof the corresponding link.θ ∈ R

n represents the phase angleat each bus, andpT = ((pg)T , (pl)T ) ∈ R

1×n indicates thepower distribution over the buses. For busi, pg

i ≥ 0 represents

1. Set V (s) = 0 for all s ∈ S2. repeat3. for all s ∈ S and a ∈ Aatt and d ∈ Adef do4. Update Q according to Eq. 25. end for6. for all s ∈ S do7. Update V according to Eq. 18. end for9. until V (s) → V ∗, i.e., V (s) converges.

Fig. 2. Dynamic programming algorithm for solving the Markov game.

active power generation whilepli ≤ 0 represents active power

consumption.c ∈ Rm gives the link capacities, andzT =

((zg)T , (zl)T ) ∈ R1×n gives the changes in power distribution

over the buses due to link failures and corresponding loadshedding, in whichzg ∈ R

n is the power reduction at thegiven generation buses, andzl ∈ R

n represents the distributionof load shedding to be optimized.

Physical constraints of the optimization problem are givenas follows. Constraint (5) is the physical power flow con-straint [16]. Since load power sources are negative, the firstinequality in constraint (6) ensures that load is being shed(i.e., not added) and the second inequality ensures that theshed amount is not so much that it will turn a load intogeneration. Similarly, constraint (7) ensures that power willonly be shed from the generation without making it become aload. Constraint (8) ensures that the phase angles are such thatthe system will be in a stable state [22]. The last constraint(9)gives the capacity limits of the links.

The trigonometric sine function in constraint (5) makesthe optimization highly non-convex. However, under normaloperation the phase angle differences(Aθ)i ≈ 0 [16], whereAθ ∈ R

m. As sin θ ≈ θ for θ ≈ 0, constraint (5) can belinearized asAT B(Aθ) − (p + z) = 0, and constraint (9)becomes−c ¹ B(Aθ) ¹ c. The hardness of the optimizationproblem then depends on Eq. 4, the cost function of loadshedding. If the function is convex, the optimization problem isalso convex. In this case, efficient techniques, such as interior-point methods [23], exist for the optimal solution. On the otherhand, for non-convex optimization, adaptive stochastic searchmethods [24] are needed to overcome local optima.

The expected immediate reward of the attacker,R(s, a, d),is then given by

∑s′ T (s, a, d, s′)u(s′), where u(s′) is the

minimum cost of shed load solved by Eq. 4, with the incidencematrix A updated according to the states′.

V. M ARKOV GAME SOLUTION

We now solve the Markov game defined in Sec. III. Ourgoal is to compute equilibrium bestpolicies for both players,where a policy is the set of per-state optimal mixed strategiesof the player concerned, and an optimal strategy is one thatmaximizes the minimum long-term reward under the beststrategy of the opponent. It is known that every Markov gamehas a non-empty set of optimal policies for each player, andone of them isstationary, i.e., it is time-independent [25]. Oursolution will find this optimal stationary policy for each player.Once the optimal policies of the players are determined, theMarkov transition probabilities are completely defined andthesystem will evolve as a standard Markov process.

A. Both players have complete information

We first consider the case in which both players havecomplete information about the game. The solution is a gener-alization ofvalue iteration, a common dynamic programming

Page 5: IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. XXX, NO. XXX, …david_yau/publications/ma... · 2018. 4. 9. · IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. XXX, NO. XXX, XXX 2012 2 viewed

IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. XXX, NO. XXX, XXX 2012 5

technique for solving MDPs [25], [15], to a game-theoreticsetting.

Recall from Sec. III that the value of states ∈ S in thegame is given by Eq. 1 for the attacker, and by Eq. 3 for thedefender. The optimal mixed strategyπ of the attacker can beobtained by solving the following linear program:

maxπ∈PD(Aatt)

V (s),

s.t.∑

a∈Aatt

Q(s, a, d)πa ≥ V (s),

a∈Aatt

πa = 1,

π º 0.The optimalπ of the defender can be obtained similarly. Theoptimalπ specifies a mixed strategy for the defender (attacker)who chooses the actiond (a) with the probability specified byπd (πa) based entirely on the current states. Such strategymay be implemented by the defender using a random numbergenerator to pick which link to reinforce or repair.

The value iteration algorithm for computing the optimalQand V for given s, a, d is specified in Fig. 2. The algorithmiteratively estimates the values ofV and Q by treating theequal signs in Eqs. 2 and 1 as assignment operators forupdating the estimates. These estimates will converge to theircorrect values [26]. Notice that each iteration of the algorithmproduces a mixed strategy for one player in states by linearprogramming (Line 7). These mixed strategies will similarlyconverge to the optimal one, and hence we obtain one player’soptimal policy when the algorithm terminates. We then use theconvergedQ’s to solve forV ’s by linear programming fromthe perspective of the other player, and obtain the optimalpolicy of the other player.

Notice that we initializeV (s) = 0. As a result, the mixedstrategy of the player after the first iteration is its optimalmixed strategy in astatic game that does not consider rewardsin future time steps, and the obtainedV (s) corresponds to thepayoff in states of this static game. For instance, for the5-bus system shown in Fig. 4, Table I(a) shows the payoffmatrix of the static game for state{u,u,u,u,u}. Notice thatthe matrix shows the payoff to an attacker, and hence, theattacker prefers an action that returns a larger number, whilethe defender prefers an action that returns a smaller number.As we consider future rewards, the payoff matrix will evolveduring the iterative process of the algorithm. When their effectsare fully considered, Table I(b) shows the “composite” payoffmatrix for the same state{u,u,u,u,u} after the convergence ofV in Line 9. For any states, the optimal mixed strategy ofthe player in the Markov game is equivalent to the optimalmixed strategy solved for an equivalent static game withthe composite matrix as payoffs. This view facilitates theinterpretation of results obtained for the Markov game.

B. One player with false information

We now consider the case when one player, the attacker,has false information about the game, presumably because ofdeception strategies employed by the defender. Specifically,we assume that the defender has misled the attacker intousing incorrect cost functions that quantify the damage ofload shedding. Also, the defender knows these incorrect costfunctions.

TABLE IQUALITY OF ACTIONS OF THE TWO PLAYERS IN STATE{U,U,U,U,U} FOR(A) A STATIC GAME THAT DOES NOT CONSIDER FUTURE REWARDS, (B)

THE FULL MARKOV GAME . NUMBERS ARE PAYOFFS FOR THE ATTACKER.HENCE, THE ATTACKER PREFERS LARGER NUMBERS, WHILE THE

DEFENDER PREFERS SMALLER NUMBERS.ppf = 0.5, pupf = 1, ppr = 0.6, pupr = 0.

(a)

1 2 3 4 5

1 61.05 0 112.1 472.1 230

2 122.1 0 112.1 472.1 230

3 122.1 0 56.05 472.1 230

4 122.1 0 112.1 236.1 230

5 122.1 0 112.1 472.1 115

Link attacked by attacker

Lin

k p

rote

cte

d

by

de

fen

de

r

(b)

1 2 3 4 5

1 210.6 152.4 279.7 674.4 445.5

2 292.4 140.6 279.7 674.4 445.5

3 292.4 152.4 204.2 674.4 445.5

4 292.4 152.4 279.7 401.6 445.5

5 292.4 152.4 279.7 674.4 287.2

Link attacked by attacker

Lin

k p

rote

cte

d

by

de

fen

de

r

1. Set V (s) = 0 for all s ∈ S2. repeat3. for all s ∈ S and d ∈ Adef do4. Update Q according to Eq. 105. end for6. for all s ∈ S and d ∈ Adef do7. Update V according to Eq. 118. end for9. until V (s) → V ∗, i.e., V (s) converges.

Fig. 3. Dynamic programming solution for the MDP used by the defenderto compute its best response to a misguided attacker policy.

Our analysis assumes that the attacker will solve the Markovgame using the algorithm in Sec. V-A, but with the falsecost functions as input. After that, since the defender knowsthe attacker’s misinformation, it can take advantage of thedeception by computing a best response to the attacker’sperceived optimal policy. In doing so, since the policy of theattacker is now fixed, the defender may optimize by solving astandard MDP formulated as follows:

Q(s, a, d) = R(s, a, d) + γ∑

s′∈S

T (s, a, d, s′)V (s′) (10)

V (s) = mind′∈Adef

Q(s, a, d′) (11)

In the above, the misguided policy of the attacker affects therewardR(s, a, d) and the transition probabilityT (s, a, d, s′).

The MDP can be solved by value iteration [27], and thealgorithm is given in Fig. 3. In the algorithm, the optimalpolicy of the defender is computed in Line 7 after the valueiteration converges.

Note that our analysis has focused on the benefit of decep-tion and ignored its cost. If the deception cost is known or canbe estimated with good accuracy, it can be readily integratedinto the proposed game model.

G G

2 54

1

3

2+j1

2#1#

1.6+j0.8

3.7+j1.3

��

��

��

(a) Bus diagram (b) Link diagramFig. 4. 5-bus system.

Page 6: IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. XXX, NO. XXX, …david_yau/publications/ma... · 2018. 4. 9. · IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. XXX, NO. XXX, XXX 2012 2 viewed

IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. XXX, NO. XXX, XXX 2012 6

TABLE IVLOAD AND SUPPLY DISTRIBUTION OFIEEE 14-BUS SYSTEM.

Bus ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Load (MW) 0 0 94.2 47.8 7.6 11.2 0 0 29.5 9 3.5 6.1 13.5 14.9

Supply (MW) 232.4 18.3 0 0 0 0 0 0 0 0 0 0 0 0

TABLE IILOAD AND SUPPLY DISTRIBUTION OF5-BUS SYSTEM.

Bus ID 1 2 3 4 5

Load (MW) 160 200 370 0 0

Supply (MW) 0 0 0 500 257.8

G2Bus 2 Bus 3Bus 7 Bus 9

Bus 8

Bus 5 Bus 6

Bus 4

Bus 1

G3

G1

2 72 84 95 33

5 6

4

1

6 7

98

1

(a) Bus diagram (b) Link diagram

Fig. 5. Standard WECC 9-bus system.

VI. EVALUATIONS

We present numerical results to illustrate solutions of theMarkov games, which include the static games as a specialcase (γ = 0). We have results for a 5-bus system [1], a WECC9-bus system [2], and an IEEE standard 14-bus system [3].Their bus and link diagrams are given in Figures 4, 5, and 6,respectively, and their per-bus aggregate generation and loadare listed in Tables II, III, and IV, respectively. We note that acommon industry practice is to operate power networks underthe so calledN − 1 criterion [28]. For simplicity, we do notincorporate redundancy in our experimental settings to satisfythe N − 1 criterion. The simplification allows the effects ofa successful link attack to manifest themselves immediately.We will focus on the 5-bus system for illustration of themore detailed results, since its relative simplicity facilitatesthe exposition.

Notice that certain links in a power system are particularlyimportant, in that interdicting such a link by itself will alreadycut off a large amount of power flow from generation to load.In the 5-bus system, linksl4 andl5 are particularly important,with l4 being more so. In the 9-bus system, linksl1, l2, andl3are particularly important, withl2 being the most. In the 14-bus system, linksl1 and l2 are particularly important. Theseimportant links usually form the focus of the player strategies.

TABLE IIILOAD AND SUPPLY DISTRIBUTION OFWECC 9-BUS SYSTEM.Bus ID 1 2 3 4 5 6 7 8 9

Load (MW) 0 0 0 0 125 90 0 100 0

Supply (MW) 71.64 163 85 0 0 0 0 0 0

��

��

(a) Bus diagram (b) Link diagram

Fig. 6. Standard IEEE 14-bus system.

A. Complete information: Amount of load shed as cost

We begin the evaluation with the assumption that bothplayers have complete information of the game. The costfunction of load shedding is the amount of load shed.

Fig. 7 shows the player strategies in selected states of theMarkov game for the 5-bus system. In the figure, a bar labeledπa(x) gives the probability that the attacker will attack linkx,and a bar labeledπd(x) gives the probability that the defenderwill repair link x (if x is down) or reinforce linkx (if xis up). For example,πa(5) represents the probability for theattacker to attackl5. Only actions with non-zero probabilitiesare included in the figure. The defender and the attacker havebudgets to affect one link only in a time step. The resultsshow that the optimal policies of the players may changesignificantly as we varyγ from zero (static game) to 0.7.

For instance, Fig. 7(a) shows that in state{u,u,u,u,u}, thedefender progressively shifts its focus from reinforcingl4 toreinforcing l5, while the attacker also attacksl5 apart froml4, as γ increases. This observation can be explained usingthe payoff matrix of the static game (Table I(a)) and thecomposite payoff matrix of the Markov game whenγ = 0.3(Table I(b)). Notice that the numbers shown are the costs ofload shedding and hence represent payoffs for the attacker –the attacker prefers higher numbers while the defender preferslower numbers. Table I(a) shows that in the static game, thepayoff of attackingl4 is always higher than that of attackingl5,i.e., bothl4 andl5 are important butl4 is even more so. Hence,the attacker will only attackl4, and the defender will alwaysdefend l4 to minimize its cost. However, Table I(b) showsthat in the Markov game, the payoff of attackingl4 is alwayshigher than attackingl5, except in the case that the defenderis also reinforcingl4. Hence, whenl4 is being reinforced withsufficiently high probability, the attacker begins to use a mixedstrategy that includesl5. This illustrates a subtle interplaybetween the players: Although asuccessful attack onl4 willbring higher benefit to the attacker, it is also more difficultifl4 is also reinforced by the defender. Hence, the attacker shiftssome of its focus to the easier targetl5.

Figures 8 and 9 show selected strategies of the players in theMarkov game for the 9-bus and 14-bus systems, respectively.Notice that in the 14-bus system, five of the links, namely{l12, l14, l18, l19, l20}, are relatively unimportant, in that theamount of load shedding is relatively small when each of themis down. For simplicity, we remove them from considerationin the player strategies.

B. Complete information: Concave functions as cost

We now show results for concave cost functions; the ex-perimental setting is otherwise the same as in the previousexperiment. The cost function of shedding load at busi isgiven by

ui(x, y) = xci − yci ,

where the load of busi is reduced fromx to y and ci is agiven constant (0 ≤ ci ≤ 1). Notice that as the parameterci

Page 7: IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. XXX, NO. XXX, …david_yau/publications/ma... · 2018. 4. 9. · IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. XXX, NO. XXX, XXX 2012 2 viewed

IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. XXX, NO. XXX, XXX 2012 7

0

0.2

0.4

0.6

0.8

1

1.2

Static Markov, ϒ = 0.3 Markov, ϒ = 0.7

Pro

ba

bil

ity

πd(4) πd(5) πa(4) πa(5)

0

0.2

0.4

0.6

0.8

1

1.2

Static Markov, ϒ = 0.3 Markov, ϒ = 0.7

Pro

ba

bil

ity

πd(4) πd(5) πa(4) πa(5)

0

0.2

0.4

0.6

0.8

1

1.2

Static Markov, ϒ = 0.3 Markov, ϒ = 0.7

Pro

ba

bil

ity

πd(4) πd(5) πa(5)

(a) State{u,u,u,u,u} (b) State{d,u,u,u,u} (c) State{d,d,d,d,u}

Fig. 7. Player strategies in selected states of the Markov game for the 5-bus system. Both players have budgets to affect onelink only in a time step.ppf = 0.5, pupf = 1, ppr = 0.6, pupr = 0.

0

0.2

0.4

0.6

0.8

1

Static Markov, ϒ = 0.3 Markov, ϒ = 0.7

Pro

ba

bil

ity

πd(1) πd(2) πa(2) πa(3) πa(7)

0

0.2

0.4

0.6

0.8

1

1.2

Static Markov, ϒ = 0.3 Markov, ϒ = 0.7

Pro

ba

bil

ity

πd(2) πd(3) πa(1) πa(2)

0

0.2

0.4

0.6

0.8

1

1.2

Static Markov, ϒ = 0.3 Markov, ϒ = 0.7

Pro

ba

bil

ity

πd(1) πd(2) πa(2) πa(3)

(a) State{d,u,u,u,u,u,u,u,u} (b) State{u,u,d,u,d,u,u,u,u} (c) State{d,u,u,d,d,u,u,u,u}

Fig. 8. Selected player strategies for the 9-bus system. Bothplayers have budgets to affect one link in a time step.ppf = 0.5, pupf = 1, ppr = 0.6, pupr = 0.

0

0.2

0.4

0.6

0.8

1

1.2

Static Markov, ϒ = 0.3 Markov, ϒ = 0.7

Pro

ba

bil

ity

πd(1) πd(2) πd(3) πa(1) πa(2) πa(3)

0

0.2

0.4

0.6

0.8

1

1.2

Static Markov, ϒ = 0.3 Markov, ϒ = 0.7

Pro

ba

bil

ity

πd(1) πd(3) πa(1) πa(2)

0

0.2

0.4

0.6

0.8

1

1.2

Static Markov, ϒ = 0.3 Markov, ϒ = 0.7

Pro

ba

bil

ity

πd(2) πd(3) πd(4) πa(1) πa(2) πa(3)

(a) State{u,u,u,u,u,u,u,u,u,u,u,u,u,u} (b) State{u,u,d,u,d,u,u,u,u,u,u,u,u,u} (c) State{u,d,d,d,d,u,u,u,u,u,u,u,u,u}

Fig. 9. Selected player strategies for the 14-bus system. Both players have budgets to affect one link in a time step.ppf = 0.5, pupf = 1, ppr = 0.6, pupr =

0.

increases, the cost of shedding load at busi also increases.We further assume that both players have budgets to affectone link only in a time step.

1) Impact of load shedding cost: We use the 5-bus systemto study the impact of load shedding cost on the optimalstrategies of the two players. Fig. 10 shows how the players’optimal strategies may change in state{u,u,u,u,u}, as the costof load shedding at Bus 3 increases, i.e., asc3 is varied to be0.2, 0.4, 0.6, and 0.8. The three sub-figures give results forγ = 0, 0.3, and 0.7, respectively. Fig. 11 gives correspondingresults for state{d,u,u,d,d}.

The strategies of the players can be explained by inspectingthe composite payoff matrices of the games in differentstates. We first discuss the state{u,u,u,u,u}. Because of spaceconstraint, we focus on the payoff matrices forγ = 0. Theyare Tables V(a), (b), (c), and (d), forc3 = 0.2, 0.4, 0.6,and 0.8, respectively. Whenc3 = 0, 2, attacking l4 is theattacker’s dominant strategy (Table V(a)), since the attackerprefers higher numbers in the table. This agrees with theattacker’s choice ofl4 as the pure strategy in Fig. 10(a). Asc3

increases, the payoffs in the matrices increase from Table V(a)to Table V(d), which simply reflects the more expensive loadshedding at Bus 3. However, notice also that the payoffs incolumn 5 generally increase more than those in column 4. Thissuggests that interdictingl5 affects more of the load on Bus3 than interdictingl4. Moreover, attackingl4 is no longer adominant strategy for the attacker.

Whenc3 > 0.4, the load shedding cost in state{u,u,u,u,d}is lower than that in state{u,u,u,d,u}. Hence,{u,u,u,u,d}, thestate that will be reached ifl5 fails, is more attractive to thedefender than{u,u,u,d,u}, the state that will be reached ifl4fails. One might therefore think that the defender will be morelikely to protectl4 than l5 whenc3 > 0.4. Fig. 10 shows that

this is not the case, however. From the figure, we see thatin fact as c3 increases, the defender always becomes morelikely to reinforce l5 than l4 (which is also true forγ > 0).Whenc3 = 0.6 and 0.8, the defender’s mixed strategies use ahigher probability to reinforcel4 thanl5. This counter-intuitivebehavior of the defender can be explained by noting that whenc3 increases from 0.4 to 0.8, the attacker also steps up itsefforts in attackingl4, and whenc3 = 0.6 and 0.8, the attackeris more likely to attackl4 than l5. We again have an intricateinterplay between the players’ actions: Although asuccessfulattempt to keepl4 up will be most attractive to the defender, ithas also become harder due to the attacker’s focus onl4. Thismotivates the defender to invest more on the easier-to-protectlink l5, so that it will have a better chance of success, andl5is still important to the defender although not as much asl4.

We now illustrate results for state{d,u,u,d,d}. In this state,the most important linksl4 andl5 are down. Because of spaceconstraints, we present two payoff matrices of the static gamein Tables VI(a) and (b), forc3 = 0.2 and 0.8, respectively.

From the tables, notice that the best actions of the defenderare always to repairl4 or l5 if these links are not attacked bythe attacker. (Recall that the defender preferslower numbersin these tables.) This simply reflects the higher importanceofthese links to the defender than the others. When either ofthese links is attacked, as it happens in the attacker’s chosenstrategy, the optimal defender strategy is a mixed strategybetweenl4 and l5 (Fig. 11).

We know also that, asc3 increases, the cost of load sheddingin {d,u,u,u,d}, the state that will be reached ifl4 recovers,increases more slowly than that in{d,u,u,d,u}, the state thatwill be reached ifl5 recovers. Hence the defender increasinglyprefers l4 to recover thanl5. This preference is reflected inFig. 11, where the defender always becomes more likely to

Page 8: IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. XXX, NO. XXX, …david_yau/publications/ma... · 2018. 4. 9. · IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. XXX, NO. XXX, XXX 2012 2 viewed

IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. XXX, NO. XXX, XXX 2012 8

0

0.2

0.4

0.6

0.8

1

1.2

0.2 0.4 0.6 0.8

Pro

ba

bil

ity

c3 (cost of load shedding at Bus 3)

πd(4) πd(5) πa(4) πa(5)

0

0.2

0.4

0.6

0.8

1

1.2

0.2 0.4 0.6 0.8

Pro

ba

bil

ity

c3 (cost of load shedding at Bus 3)

πd(4) πd(5) πa(4) πa(5)

0

0.2

0.4

0.6

0.8

1

1.2

0.2 0.4 0.6 0.8

Pro

ba

bil

ity

c3 (cost of load shedding at Bus 3)

πd(4) πd(5) πa(4) πa(5)

(a) Static game (γ = 0) (b) Markov game,γ = 0.3 (c) Markov game,γ = 0.7

Fig. 10. Optimal strategies of the two players in state ={u,u,u,u,u} for the 5-bus system, as a function of the cost of load sheddingat Bus 3 (c3).c1 = 0.2, c2 = 0.2, ppf = 0.5, pupf = 1, ppr = 0.6, pupr = 0.

0

0.2

0.4

0.6

0.8

1

1.2

0.2 0.4 0.6 0.8

Pro

ba

bil

ity

c3 (cost of load shedding at Bus 3)

πd(4) πd(5) πa(4) πa(5)

0

0.2

0.4

0.6

0.8

1

1.2

0.2 0.4 0.6 0.8

Pro

ba

bil

ity

c3 (cost of load shedding at Bus 3)

πd(4) πd(5) πa(3) πa(5)

0

0.2

0.4

0.6

0.8

1

1.2

0.2 0.4 0.6 0.8

Pro

ba

bil

ity

c3 (cost of load shedding at Bus 3)

πd(4) πd(5) πa(3) πa(5)

(a) Static game (γ = 0) (b) Markov game,γ = 0.3 (c) Markov game,γ = 0.7

Fig. 11. Optimal strategies of the two players in state ={d,u,u,d,d} for the 5-bus system, as a function of the cost of load sheddingat Bus 3 (c3).c1 = 0.2, c2 = 0.2, ppf = 0.5, pupf = 1, ppr = 0.6, pupr = 0.

TABLE VQUALITY OF ACTIONS OF THE TWO PLAYERS IN STATE{U,U,U,U,U} FOR A

STATIC GAME THAT DOES NOT CONSIDER FUTURE REWARDS.ppf = 0.5, pupf = 1, ppr = 0.6, pupr = 0. ATTACKER PREFERS LARGER

NUMBERS, DEFENDER PREFERS SMALLER NUMBERS.(a) c1 = c2 = c3 = 0.2

1 2 3 4 5

1 0.126 0.000 0.227 1.597 0.571

2 0.251 0.000 0.227 1.597 0.571

3 0.251 0.000 0.114 1.597 0.571

4 0.251 0.000 0.227 0.798 0.571

5 0.251 0.000 0.227 1.597 0.286

Link attacked by attacker

Lin

k p

rote

cte

d

by

de

fen

de

r

(b) c1 = c2 = 0.2, c3 = 0.4

1 2 3 4 5

1 0.345 0.000 1.107 4.029 3.042

2 0.691 0.000 1.107 4.029 3.042

3 0.691 0.000 0.553 4.029 3.042

4 0.691 0.000 1.107 2.014 3.042

5 0.691 0.000 1.107 4.029 1.521

Link attacked by attacker

Lin

k p

rote

cte

d

by

de

fen

de

r

(c) c1 = c2 = 0.2, c3 = 0.6

1 2 3 4 5

1 0.345 0.000 3.087 10.39 11.4

2 0.691 0.000 3.087 10.39 11.4

3 0.691 0.000 1.544 10.39 11.4

4 0.691 0.000 3.087 5.194 11.4

5 0.691 0.000 3.087 10.39 5.701

Link attacked by attacker

Lin

k p

rote

cte

d

by

de

fen

de

r

(d) c1 = c2 = 0.2, c3 = 0.8

1 2 3 4 5

1 0.345 0.866 10.74 34.45 44.36

2 0.691 0.433 10.74 34.45 44.36

3 0.691 0.866 5.371 34.45 44.36

4 0.691 0.866 10.74 17.22 44.36

5 0.691 0.866 10.74 34.45 22.18

Link attacked by attacker

Lin

k p

rote

cte

d

by

de

fen

de

r

repair l4 than l5 asc3 increases (for all the values ofγ).

On the other hand, whenγ = 0, the attacker becomesless likely to attackl4 as c3 increases. Whenγ > 0, theattacker in fact no longer includesl4 as part of its mixedstrategy, but it includesl3 instead. These decisions reflect theattacker’s attempt to look for easier targets – links that havea higher chance to bring down – in the face of the defender’semphasis onl4 andl5. In particular, when the defender focusesexclusively onl4 and l5, an attack onl3 is certain to succeedsincepupf = 1. And a failedl3 is still attractive to the attacker,although not as much asl4 and l5.

TABLE VIQUALITY OF ACTIONS OF THE TWO PLAYERS IN STATE{D,U,U,D,D} FOR A

STATIC GAME THAT DOES NOT CONSIDER FUTURE REWARDS.ppf = 0.5, pupf = 1, ppr = 0.6, pupr = 0. ATTACKER PREFERS LARGER

NUMBERS, DEFENDER PREFERS SMALLER NUMBERS.(a) c1 = c2 = c3 = 0.2

1 2 3 4 5

1 8.425 8.447 8.457 8.356 8.356

2 8.425 8.488 8.581 8.425 8.425

3 8.425 8.550 8.503 8.425 8.425

4 4.138 5.784 6.827 8.425 4.138

5 4.328 5.594 5.606 4.328 8.425

Link attacked by attacker

Lin

k p

rote

cte

d

by

de

fen

de

r(b) c1 = c2 = 0.2, c3 = 0.8

1 2 3 4 5

1 119 119 119 119 119

2 119 119 119 119 119

3 119 119 119 119 119

4 84.6 105.7 118.1 119 84.6

5 68.28 68.17 68.17 68.28 119

Link attacked by attacker

Lin

k p

rote

cte

d

by

de

fen

de

r

C. Impact of deception

In this section, we evaluate how the defender can increaseits payoff by misleading the attacker into using false costfunctions for the load shedding. We present results for the5-bus system and the WECC 9-bus system.

Figures 12 and 13 show the load shedding cost of thedefender after obtaining a best response to the perceivedoptimal policy of the attacker under misinformation. In Fig. 12,the true cost function is the amount of load shed, and thedeceiving cost functions are the concave functions presentedin Section VI-B. We setc1 = 0.2 andc2 = 0.2 while varyingc3 for the 5-bus system. For the 9-bus system, we setci = 0.2for i = 1, . . . , 4 and i = 5, . . . , 9 and varyc5. We observethat in general, the defender gains (i.e., its load sheddingcostis reduced) as the deceiving cost functions deviate more fromthe true ones.

In Fig. 13, the true cost function for the 5-bus system is theconcave function withc1 = 0.2, c2 = 0.2, c3 = 0.8, and thatfor the 9-bus system is the concave function withci = 0.2 fori = 1, . . . , 4 andi = 5, . . . , 9 andc5 = 0.8. The deceiving costfunctions for both bus systems are the respective amounts ofload shed. The results show that the defender always reducesits load shedding cost by using deception. The gains in some ofthe states are significant (up to 50% cost reduction), althoughin general they depend on detailed parameters.

Page 9: IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. XXX, NO. XXX, …david_yau/publications/ma... · 2018. 4. 9. · IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. XXX, NO. XXX, XXX 2012 2 viewed

0

200

400

600

800

1000

0 10 20 30 40

Ex

p.

loa

d s

he

dd

ing

co

st

State

True

False

False

False

c3

= 0.4

c3

= 0.6

c3

= 0.8

True

False c3=0.6

False c3=0.4

False c3=0.8

(a) 5-bus system

0

100

200

300

400

500

0 20 40 60 80

Ex

p.

loa

d s

he

dd

ing

co

st

State

True

False

False

False

c5

= 0.4

c5

= 0.6

c5

= 0.8

True

False c5=0.6

False c5=0.4

False c5=0.8

(b) WECC 9-bus system (64 of the states)Fig. 12. Expected cost of load shedding in different states of the Markovgame, under true information and misinformation of the attacker. The truecost functions are the respective amounts of load shed. The deceiving costfunctions are concave functions of differentci parameters.γ = 0.3, ppf =

0.5, pupf = 1, ppr = 0.6, pupr = 0.

0

50

100

150

200

0 10 20 30 40

Ex

p.

loa

d s

he

dd

ing

co

st

State

True

False

(a) 5-bus system

0

10

20

30

40

50

60

70

80

0 20 40 60 80

Ex

p.

loa

d s

he

dd

ing

co

st

State

True

False

(b) WECC 9-bus system (64 of the states)Fig. 13. Expected cost of load shedding in different states of the Markovgame, under true information and misinformation of the attacker. The truecost functions are the concave functions withc1 = 0.2, c2 = 0.2, c3 = 0.8for the 5-bus system, andc5 = 0 and ci = 0.2 otherwise for the 9-bussystem. The deceiving cost functions are the respective amounts of load shed.γ = 0.3, ppf = 0.5, pupf = 1, ppr = 0.6, pupr = 0.

VII. C ONCLUSION

We have presented a Markov game analysis of attack-defense in power systems, while considering the dimension ofinformation that will become more important with the adventof smart grids. Our results complement related results usingstatic games or Stackelberg games. We show that considerationof repeated plays under Markov-type uncertainties will ingeneral modify the strategies of the players relative to gameswith single plays. This is because the players will need toconsider the impact of a current action on the future plays,although the future rewards are generally discounted by afactor γ. We have applied our analysis to a 5-bus systemthat has been studied in the literature, a WECC 9-bus system,and an IEEE standard 14-bus system. The relative simplicityof the 5-bus system has allowed us to analyze its resultsin detail. Our analysis exposes subtle features of the gamesolutions, considering thevalues of different game states to theplayers and the intricate interplay between their strategies. We

have also shown how a defender can exploit the informationdimension by using deception, and that doing so will alwaysincrease the defender’s payoffs, although the size of the gainswill depend on more detailed parameters. It is clear that suchdeception can likewise be gainfully employed by the attacker.How a deceived player may learn the true information throughrepeated plays and other channels is an interesting problemforfurther research. It is also interesting to apply our analysis toother critical infrastructures.

REFERENCES

[1] Calculation of The Electrical Power System. Hydro-electricity Press,1978.

[2] P. M. Anderson and A. A. Fouad,Power System Control and Stability.Galgotia, 1981.

[3] R. Christie, “Power system test archive,” http://www.ee.washington.edu/research/pstca, Aug. 1999.

[4] S. Jha, O. Sheyner, and J. Wing, “Two formal analysis of attack graphs,”in Proc. of the IEEE workshop on Computer Security Foundations, 2002.

[5] A. Holmgren, E. Jenelius, and J. Westin, “Evaluating strategies fordefending electric power networks against antagonistic attacks,” IEEETrans. Power Syst, vol. 22, no. 1, 2007.

[6] J. Salmeron, K. Wood, and R. Baldick, “Analysis of electric grid securityunder terrorist threat,”IEEE Trans. Power Syst, vol. 19, no. 2, 2004.

[7] P. H. Corredor and M. E. Ruiz, “Against all odds,”IEEE Power andEnergy Magazine, vol. 9, no. 2, March/April 2011.

[8] E. White, “New attack on electricity lines in yemen,” http://www.yobserver.com/local-news/10022003.html.

[9] D. Bienstock and A. Verma, “TheN − k Problem in Power Grids:New Models, Formulations, and Numerical Experiments,”SIAM J. onOptimization, vol. 20, no. 5, 2010.

[10] C. Laughman, K. Lee, R. Cox, S. Shaw, S. Leeb, L. Norford, andP. Armstrong, “Power signature analysis,”IEEE Power & EnergyMagazine, vol. 1, no. 2, 2003.

[11] D. A. Jones, C. E. Davis, M. A. Turnquist, and L. K. Nozick, “Physicalsecurity and vulnerability modeling for infrastructure facilities,” in Proc.of the Hawaii International Conference on System Sciences, 2006.

[12] G. Brown, M. Carlyle, J. Salmeron, and K. Wood, “Defending criticalinfrastructure,”Interfaces, vol. 36, no. 6, 2006.

[13] R. Powell, “Allocating defensive resources with private informationabout vulnerability,”The American Political Science Review, vol. 101,no. 4, 2007.

[14] G. Chen, Z. Y. Dong, D. J. Hill, and Y. S. Xue, “Exploring reliablestrategies for defending power systems against targeted attacks,” IEEETrans. Power Syst, vol. 26, no. 3, 2011.

[15] T. Alpcan and T. Basar,Network Security: A Decision and GameTheoretic Approach. Cambridge University Press, 2010.

[16] A. Pinar, J. Meza, V. Donde, and B. Lesieutre, “Optimization strategiesfor the vulnerability analysis of the electric power grid,”SIAM Journalon Optimization, vol. 20, no. 4, 2010.

[17] J. Hu and M. P. Wellman, “Multiagent reinforcement learning: The-oretical framework and an algorithm,” inProc. of the InternationalConference on Machine Learning, 1998.

[18] V. M. Bier, S. Oliveros, and L. Samuelson, “Choosing whatto protect,”Journal of Public Economic Theory, vol. 9, no. 4, 2007.

[19] G. Levitin and K. Hausken, “False targets efficiency in defense strategy,”European Journal of Operational Research, vol. 194, no. 1, 2009.

[20] J. Zhuang, V. M. Bier, and O. Alagoz, “Modeling secrecy and deceptionin a multiple-period attacker-defender signaling game,”EuropeanJour-nal of Operational Research, vol. 203, no. 2, 2010.

[21] S. Boyd and L. Vandenberghe,Convex Optimization. CambridgeUniversity Press, 2004.

[22] J. Grainger and W. Stevenson,Power System Analysis. McGraw-Hill,1994.

[23] J. Nocedal and S. Wright,Numerical Optimization. Springer, 1999.[24] A. A. Zhigljavsky, Theory of Global Random Search. Kluwer, 1991.[25] M. Littman, “Markov games as a framework for multi-agent reinforce-

ment learning,” inProc. of the International Conference on MachineLearning, 1994.

[26] G. Owen,Game Theory: Second edition. Academic Press, 1982.[27] D. P. Bertsekas,Dynamic Programming: Deterministic and Stochastic

Models. Prentice-Hall, 1987.[28] M. Ilic, J. Arce, Y. Yoon, Y., and E. Fumagalli, “Assessing Reliability as

the Electric Power Industry Restructures,”Electricity Journal, vol. 14,no. 2, 2001.

9

Page 10: IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. XXX, NO. XXX, …david_yau/publications/ma... · 2018. 4. 9. · IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. XXX, NO. XXX, XXX 2012 2 viewed

Chris Y. T. Ma (M’11) received his B.Eng. (firstclass honors) in Computer Engineering and hisM.Phil. in Computer Science and Engineering fromthe Chinese University of Hong Kong in 2004 and2006, respectively, and his Ph.D in Computer Sci-ence from Purdue University in 2010. He is currentlya postdoctoral researcher at the Advanced DigitalSciences Center, Singapore. His research interestsinclude performance and security study of wirelessnetworks, mobile sensor networks, and smart grids.

David K. Y. Yau (M’10) received the B.Sc. (firstclass honors) from the Chinese University of HongKong in 1989, and the M.S. and Ph.D from theUniversity of Texas at Austin in 1992 and 1997,respectively, all in computer science. He is currentlyDistinguished Scientist at the Advanced Digital Sci-ences Center, Singapore, and Associate Professorof Computer Science at Purdue University, WestLafayette, IN, USA. Dr. Yau was the recipient ofan NSF CAREER award for research in qualityof service provisioning. His other areas of research

interest are protocol design and implementation, wireless and sensor networks,network security, network incentives, and smart grids. Dr. Yau served as Asso-ciate Editor ofIEEE/ACM Trans. Networking (2004–09); Vice General Chair(2006), TPC co-Chair (2007), and TPC Area Chair (2011) of IEEE Int’l Conf.Network Protocols (ICNP); TPC co-Chair (2006) and SteeringCommitteemember (2007–09) of IEEE Int’l Workshop Quality of Service (IWQoS); andTPC Track co-Chair (Network/Web/P2P Protocols and Applications) of IEEEInt’l Conf. Distributed Computing Systems (ICDCS) 2012. He isa memberof the IEEE.

Xin Lou received the B.Eng. (first class honors)in communication engineering from Sichuan Uni-versity, China, in 2009. Currently, he is workingtowards his Ph.D. degree at the City Universityof Hong Kong. He was a software engineer atAdvanced Digital Sciences Center, Singapore, from2011-2012. His research interests include perfor-mance analysis, nonlinear optimization, smart gridsand wireless networks.

Nageswara S. V. Rao(F’08) received the BTech de-gree in electronics and communications engineeringfrom the National Institute of Technology, Warangal,India, in 1982, the M.E. degree in computer scienceand automation from the Indian Institute of Science,Bangalore, in 1984, and the Ph.D degree in computerscience from Louisiana State University in 1988.He is currently a corporate fellow in the ComputerScience and Mathematics Division at Oak RidgeNational Laboratory, where he joined in 1993. Hehas been on assignment at the US Missile Defense

Agency as the technical director of the C2BMC Knowledge Center since2008. He was an assistant professor in the Department of Computer Scienceat Old Dominion University from 1988-1993. He has published more than250 technical conference and journal papers in the areas of sensor networks,information fusion, and high-performance networking. He received the 2005IEEE Technical Achievement Award for his contributions in the informationfusion area. He is a fellow of the IEEE.

10