8
IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 2, NO. 1, JANUARY 2015 11 A Predator-prey Particle Swarm Optimization Approach to Multiple UCAV Air Combat Modeled by Dynamic Game Theory Haibin Duan, Pei Li, and Yaxiang Yu Abstract—Dynamic game theory has received considerable attention as a promising technique for formulating control actions for agents in an extended complex enterprise that involves an adversary. At each decision making step, each side seeks the best scheme with the purpose of maximizing its own objective func- tion. In this paper, a game theoretic approach based on predator- prey particle swarm optimization (PP-PSO) is presented, and the dynamic task assignment problem for multiple unmanned combat aerial vehicles (UCAVs) in military operation is decomposed and modeled as a two-player game at each decision stage. The optimal assignment scheme of each stage is regarded as a mixed Nash equilibrium, which can be solved by using the PP-PSO. The effectiveness of our proposed methodology is verified by a typical example of an air military operation that involves two opposing forces: the attacking force Red Red Red and the defense force Blue Blue Blue. Index Terms—Unmanned combat aerial vehicle (UCAV), game theory, air combat, predator-prey, particle swarm optimization (PSO), Nash equilibrium. I. I NTRODUCTION C OMPARED to unmanned combat aerial vehicles (UCAVs) that perform solo missions, greater efficiency and operational capability can be realized from teams of UCAVs operating in a coordinated fashion [1-5] . Designing UCAVs with intelligent and coordinated action capabilities to achieve an overall objective is a major part of multiple UCAVs control in a complicated and uncertain environment [6-10] . Actually, a military air operation involving multiple UCAVs is a complex dynamic system with many interacting decision- making units which have even conflicting objectives. Modeling and control of such a system is an extremely challenging task, whose purpose is to seek a feasible and optimal scheme to assign the limited combat resource to specific units of the adversary while taking into account the adversary 0 s possible defense strategies [8, 11] . The difficulty lies not only in that it is often very difficult to mathematically describe the underlying Manuscript received July 24, 2013; accepted July 18, 2014. This work was supported by National Natural Science Foundation of China (61425008, 61333004, 61273054), Top-Notch Young Talents Program of China, and Aeronautical Foundation of China (2013585104). Recommended by Associate Editor Changyin Sun Citation: Haibin Duan, Pei Li, Yaxiang Yu.A predator-prey particle swarm optimization approach to multiple UCAV air combat modeled by dynamic game theory. IEEE/CAA Journal of Automatica Sinica, 2015, 2(1): 11-18 Haibin Duan, Pei Li, and Yaxiang Yu are with the Science and Technology on Aircraft Control Laboratory, School of Automation Sci- ence and Electrical Engineering, Beihang University (BUAA), Beijing 100191, China (e-mail: [email protected]; [email protected]; [email protected]). processes and objectives of the decision maker but also in that the fitness of one decision maker depends on both its own control input and the opponent 0 s strategies as well. Dynamic game theory has received increasingly intensive attention as a promising technique for formulating action strategies for agents in such a complex situation, which in- volves competition against an adversary. The priority of game theory in solving control and decision-making problems with an adversary opponent has been shown in many studies [12-15] . A game theory approach was proposed for target tracking problems in sensor networks in [14], where the target is assumed to be an intelligent agent who is able to maximize filtering errors by escaping behavior. The pursuit-evasion game formulations were employed in [16] for the development of improved interceptor guidance laws. Cooperative game theory was used to ensure team cooperation by Semsar-Kazerooni et al. [13] , where a team of agents aimed to accomplish consensus over a common value for their output. Although finding the Nash equilibrium in a two-player game may be easy since the zero-sum version can be solved in polynomial time by linear programming, this problem has been proved to be indeed PPAD-complete [17-18] . So the problem of computing Nash equilibria in games is computationally extremely difficult, if not impossible. Based on the analogy of the swarm of birds and the school of fish, Kennedy and Eberhart developed a powerful optimization method, particle swarm optimization (PSO) [19-20] , addressing the social inter- action, rather than purely individual cognitive abilities. As one of the most representative method aiming at producing com- putational intelligence by simulating the collective behavior in nature, PSO has been seen as an attractive optimization tool for the advantages of simple implementation procedure, good performance and fast convergence speed. However, it has been shown that this method is easily trapped into local optima when coping with complicated problems, and various tweaks and adjustments have been made to the basic algorithm over the past decade [20-22] . To overcome the aforementioned problems, a hybrid predator-prey PSO (PP-PSO) was firstly proposed in [21] by introducing the predator-prey mechanism in the biological world to the optimization process. Recently, bio-inspired computation in UCAVs have attracted much attention [23-25] . However, the game theory and solutions to the problem of task assignment have been studied indepen- dently. The main contribution of this paper is the development

A Predator-prey Particle Swarm Optimization …hbduan.buaa.edu.cn/papers/2015IEEE_CAA_JAS_HBDuan.pdfIEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 2, NO. 1, JANUARY 2015 11 A Predator-prey

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Predator-prey Particle Swarm Optimization …hbduan.buaa.edu.cn/papers/2015IEEE_CAA_JAS_HBDuan.pdfIEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 2, NO. 1, JANUARY 2015 11 A Predator-prey

IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 2, NO. 1, JANUARY 2015 11

A Predator-prey Particle Swarm Optimization

Approach to Multiple UCAV Air Combat

Modeled by Dynamic Game TheoryHaibin Duan, Pei Li, and Yaxiang Yu

Abstract—Dynamic game theory has received considerableattention as a promising technique for formulating control actionsfor agents in an extended complex enterprise that involves anadversary. At each decision making step, each side seeks the bestscheme with the purpose of maximizing its own objective func-tion. In this paper, a game theoretic approach based on predator-prey particle swarm optimization (PP-PSO) is presented, and thedynamic task assignment problem for multiple unmanned combataerial vehicles (UCAVs) in military operation is decomposed andmodeled as a two-player game at each decision stage. The optimalassignment scheme of each stage is regarded as a mixed Nashequilibrium, which can be solved by using the PP-PSO. Theeffectiveness of our proposed methodology is verified by a typicalexample of an air military operation that involves two opposingforces: the attacking force RedRedRed and the defense force BlueBlueBlue.

Index Terms—Unmanned combat aerial vehicle (UCAV), gametheory, air combat, predator-prey, particle swarm optimization(PSO), Nash equilibrium.

I. INTRODUCTION

COMPARED to unmanned combat aerial vehicles(UCAVs) that perform solo missions, greater efficiency

and operational capability can be realized from teams ofUCAVs operating in a coordinated fashion[1−5]. DesigningUCAVs with intelligent and coordinated action capabilities toachieve an overall objective is a major part of multiple UCAVscontrol in a complicated and uncertain environment[6−10].Actually, a military air operation involving multiple UCAVsis a complex dynamic system with many interacting decision-making units which have even conflicting objectives. Modelingand control of such a system is an extremely challenging task,whose purpose is to seek a feasible and optimal scheme toassign the limited combat resource to specific units of theadversary while taking into account the adversary′s possibledefense strategies[8, 11]. The difficulty lies not only in that it isoften very difficult to mathematically describe the underlying

Manuscript received July 24, 2013; accepted July 18, 2014. This workwas supported by National Natural Science Foundation of China (61425008,61333004, 61273054), Top-Notch Young Talents Program of China, andAeronautical Foundation of China (2013585104). Recommended by AssociateEditor Changyin Sun

Citation: Haibin Duan, Pei Li, Yaxiang Yu. A predator-prey particle swarmoptimization approach to multiple UCAV air combat modeled by dynamicgame theory. IEEE/CAA Journal of Automatica Sinica, 2015, 2(1): 11−18

Haibin Duan, Pei Li, and Yaxiang Yu are with the Science andTechnology on Aircraft Control Laboratory, School of Automation Sci-ence and Electrical Engineering, Beihang University (BUAA), Beijing100191, China (e-mail: [email protected]; [email protected];[email protected]).

processes and objectives of the decision maker but also in thatthe fitness of one decision maker depends on both its owncontrol input and the opponent′s strategies as well.

Dynamic game theory has received increasingly intensiveattention as a promising technique for formulating actionstrategies for agents in such a complex situation, which in-volves competition against an adversary. The priority of gametheory in solving control and decision-making problems withan adversary opponent has been shown in many studies[12−15].A game theory approach was proposed for target trackingproblems in sensor networks in [14], where the target isassumed to be an intelligent agent who is able to maximizefiltering errors by escaping behavior. The pursuit-evasion gameformulations were employed in [16] for the development ofimproved interceptor guidance laws. Cooperative game theorywas used to ensure team cooperation by Semsar-Kazerooni etal.[13], where a team of agents aimed to accomplish consensusover a common value for their output.

Although finding the Nash equilibrium in a two-player gamemay be easy since the zero-sum version can be solved inpolynomial time by linear programming, this problem has beenproved to be indeed PPAD-complete[17−18]. So the problemof computing Nash equilibria in games is computationallyextremely difficult, if not impossible. Based on the analogyof the swarm of birds and the school of fish, Kennedy andEberhart developed a powerful optimization method, particleswarm optimization (PSO)[19−20], addressing the social inter-action, rather than purely individual cognitive abilities. As oneof the most representative method aiming at producing com-putational intelligence by simulating the collective behaviorin nature, PSO has been seen as an attractive optimizationtool for the advantages of simple implementation procedure,good performance and fast convergence speed. However, ithas been shown that this method is easily trapped into localoptima when coping with complicated problems, and varioustweaks and adjustments have been made to the basic algorithmover the past decade[20−22]. To overcome the aforementionedproblems, a hybrid predator-prey PSO (PP-PSO) was firstlyproposed in [21] by introducing the predator-prey mechanismin the biological world to the optimization process.

Recently, bio-inspired computation in UCAVs have attractedmuch attention[23−25]. However, the game theory and solutionsto the problem of task assignment have been studied indepen-dently. The main contribution of this paper is the development

Page 2: A Predator-prey Particle Swarm Optimization …hbduan.buaa.edu.cn/papers/2015IEEE_CAA_JAS_HBDuan.pdfIEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 2, NO. 1, JANUARY 2015 11 A Predator-prey

12 IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 2, NO. 1, JANUARY 2015

of a game theoretic approach to dynamic UCAV air combatin a military operation based on the PP-PSO algorithm. Thedynamic task assignment problem is handled from a gametheoretic perspective, where the assignment scheme is obtainedby solving the mixed Nash equilibrium using PP-PSO at eachdecision step.

The remainder of the paper is organized as follows: SectionII describes the formulation of the problem, including the at-trition model of a military air operation and its game theoreticrepresentation. Subsequently, we propose a predator-prey PSOfor the mixed Nash equilibrium computing of two-player, non-cooperative game in Section III. An example of an adversaryscenario of UCAV combat involving two opposing sides ispresented in Section IV to illustrate the effectiveness andadaption of the proposed methodology. Concluding remarksare offered in the last section.

II. A DYNAMIC GAME THEORETIC FORMULATION FORUCAV AIR COMBAT

A. Dynamic Model of UCAV Air Combat

There are two combat sides in the UCAV air combat model.Specifically, the attacking side is labeled as Red and the de-fending force as Blue. Each side consists of different combatunits, which are made up of different numbers of combatplatforms armed with weapons. Each unit is fully described byits location, number of platforms and the average number ofweapons per platform. Thus, the state of each unit at time k isdefined as ξX

i (k) =[xX

i (k) , yXi (k) , pX

i (k)wXi (k)

], where

X denotes the Red force or Blue force,[xX

i (k) , yXi (k)

]represents the unit location, pX

i (k) corresponds to the numberof platforms of the ith unit at time k, and wX

i (k) to the numberof weapons on each platform in the ith unit of X . The numberof platforms for the moving units changes according to thefollowing attrition equations

pXi (k + 1) = pX

i (k) Aki (k) . (1)

The term in (1) represents the percentage of platforms in theith unit of X force, which survive the transition from time k tok+1. For each unit in force X , this percentage is dependent onthe identities of the attacking and the attacked units determinedby the choice of target control, and is expressed as

AXi (k) = 1−

Nd∑

j=1

QXYij (k)PXY

ij (k). (2)

It is assumed in (2) that Nd units of Y fire at the ith unitof X . The engagement factor QXY

ij (k) of the jth unit of Yattacking the ith unit of X at time k is computed from

QXYij = βXY

ij

[1− exp

(−pY

j (k)pX

i (k)

)], (3)

where βXYij represents the probability that the jth unit of Y

acquires the ith unit of X as a target, and is calculated by

βXYij =

{1, if pY

j − pXi ≥ 0,

exp(pY

j − pXi

), if pY

j − pXi < 0.

(4)

The attrition factor PXYij (k) in (2) represents the probability

of the platforms in the ith unit of X being destroyed by the

salvo of sYj (k) fired from the jth unit of Y at time k, and is

computed as follows

PXYij (k) =

[1− (

1− βwPKXYij

)SYj (k)

], (5)

where the term 0 ≤ βw ≤ 1 represents the weather impactwhich reduces the kill probability according to the weathercondition, i.e., 1 corresponds to ideal weather condition while0 corresponds to the worst weather condition. PKXY

ij is theprobability of ith unit of X being completely destroyed by thejth unit of Y under ideal weather and terrain conditions.

In the equation mentioned above sYi is the average effective

kill factor when the jth unit of X attacks the ith unit of Ywith salvo cY

i (k), and is calculated from

sYj =

cYj (k) pY

j (k)pX

i (k)

(pY

j (k)pX

i (k)

)c−1

, (6)

where cYi is the salvo size of jth combat unit of Y and c is a

constant referred to as Wes coefficient.The control vector for each unit for both sides is chosen as

uXi (k) =

[V X

xi(k) V X

yi(k) cX

xi(k) dX

xi(k)

], (7)

where V Xxi

(k) and V Xyi

(k) are respectively the relocatingcontrol corresponding to the x-coordinate and y-coordinate,dX

xi(k) is the number of units that fire at ith unit of X , and

cXxi

(k) is the salvo size control variable that decides howmany weapons should fire. The number of weapons is updatedaccording to

wXi (k + 1) = wX

i (k)− cXi (k) . (8)

The state equations of each unit engaged in an air combatare defined as

xXi (k + 1) = xX

i (k) + V Xxi

(k) ,yX

i (k + 1) = yXi (k) + V X

yi(k) ,

pXi (k + 1) = pX

i (k) Aki (k) ,

wXi (k + 1) = wX

i (k) + cXi (k) .

(9)

B. Game Theoretic Formulation for UCAV Air Combat

The problem of dynamic task assignment in the air combatis modeled from a game theoretic perspective in this paper.Suppose Red consists of NR units (UCAVs) and it fires CRmissiles during each attack or defense. There are NB units inthe Blue force, whose salvo size is also a constant CB. Ateach decision making step k, both sides decide on which unitsof its own side should be chosen to attack and which units ofthe opponent should be chosen as targets, with the purpose ofmaximizing its own objective function. Each combination ofattacking and attacked units is seen as a pure strategy in thegame. For each side, the number of pure strategies is calculatedas

NRS = CCRNR · CCR

NB · CR!, (10)

NBS = CCBNR · CCB

NB · CB!. (11)

Page 3: A Predator-prey Particle Swarm Optimization …hbduan.buaa.edu.cn/papers/2015IEEE_CAA_JAS_HBDuan.pdfIEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 2, NO. 1, JANUARY 2015 11 A Predator-prey

DUAN et al.: A PREDATOR-PREY PARTICLE SWARM OPTIMIZATION APPROACH TO MULTIPLE UCAV AIR COMBAT MODELED BY · · · 13

The payoff matrix for both sides is an NRS ×NBS matrix,expressed as

MRNRS×NBS=

J1,1R(k) J1,2

R(k) · · · J1,NBS

R(k)J2,1

R(k) J2,2R(k) · · · J2,NBS

R(k)...

.... . .

...JNRS ,1

R(k) JNRS ,2R(k) · · · JNRS ,NBS

R(k)

,

(12)

MBNRS×NBS=

J1,1B(k) J1,2

B(k) · · · J1,NBS

B(k)J2,1

B(k) J2,2B(k) · · · J2,NBS

B(k)...

.... . .

...JNRS ,1

B(k) JNRS ,2B(k) · · · JNRS ,NBS

B(k)

.

(13)

Each entry JRi,j (k) in MR corresponds to the payoff of

Red when it takes the ith pure strategy against the jth purestrategy of Blue. For the attacking force of Red, the objectivefunction JR

i,j (k) is calculated as

JR(k) =NR∑

i=1

εipi

R(k)pi

R(0)−

NB∑

i=1

τipi

B(k)pi

B(0), (14)

where ε and τ are weight coefficients. The objection functionfor the defense of Blue is calculated, by the same token, as

JB(k) = −NR∑

i=1

a′ipi

R(k)pi

R(0)+

NB∑

i=1

b′ipi

B(k)pi

B(0). (15)

From a game theoretic point of view, the cooperative UCAVtask assignment problem is for the tagged side, Red, tomaximize its own payoff at each decision step, by calculatinga mixed Nash equilibrium for the NRS ×NBS matrix game.

III. PREDATOR-PREY PSO FOR THE MIXED NASHSOLUTION

A. Predator-prey PSO

In the gbest-model of PSO, each particle has informationof its current position and velocity in the solution space[21].And it has the best solution found so far of itself as pbest andthe best solution of a whole swarm as gbest. The gbest-modelcan be expressed as

vij(k + 1) = ωvij(k) + c1r1[pi(k)− xij(k)]+c2r2[gi(k)− xij(k)], (16)

xij(k + 1) = xij(k) + vij(k + 1), (17)

where vij(k) and xij(k) respectively denote the velocity andposition of the ith particle in the jth dimension at stepk, and c1 and c2 are weight coefficients, r1 and r2 arerandom numbers between 0 and 1 to reflect the stochasticalgorithm nature. The personal best position pi correspondsto the position in the search space where particle i has theminimum fitness value. The global best position denoted by

gi represents the position yielding the best fitness value amongall the particles.

Unfortunately, the basic PSO algorithm is easy to fall intolocal optima. In this condition, the concept of predator-preybehavior is introduced into the basic PSO to improve theoptima finding performance[26−28]. This adjustment takes acue from the behavior of schools of sardines and pods of killerwhales. In this model, particles are divided into two categories,predator and prey. Predators show the behavior of chasing thecenter of preys′ swarm; they look like chasing preys. Andpreys escape from predators in the multidimensional solutionspace. After taking a tradeoff between predation risk andtheir energy, escaping particles would take different escapingbehaviors. The velocities of the predator and the prey in thePP-PSO can be defined by

vdij(k + 1) = ωdvdij(k) + c1r1[pdij(k)− xdij(k)]+c2r2[gdj(k)− xdij(k)] + c3r3[gj(k)− xdij(k)], (18)

vrij(k + 1) = ωrvrij(k) + c4r4[prij(k)− xrij(k)]+c5r5[grj(k)− xrij(k)] + c6r6[gj(k)− xrij(k)]−Pasign[xdIj(k)− xrij(k)] exp[−b|xdIj(k)− xrij(k)|],

(19)

where d and r denote the predator and prey, respectively, pdi

is the best position of predators, pri is the best position ofpreys, g is the best position which all the particles have everfound. And ωd and ωr are defined as

ωd = 0.2 exp(−10

iteration

iterationmax

)+ 0.4, (20)

ωr = ωmax − ωmax − ωmin

iterationmaxiteration, (21)

where ωd and ωr are the inertia weights of predators and preys,which regulate the trade-off between the global (wide-ranging)and local (nearby) exploration abilities of the swarm andare considered critical for the convergence behavior of PSO.iterationmax represents the maximum number of iterationsand ωmax and ωmin denote the maximum and minimum valueof ωr, respectively. And the definition of I is given by thefollowing expression

I = {k|mink

(|xdk − xri|)}. (22)

Then I denotes the number of the ith prey’s nearest predator.In (18), P is used to decide if the prey escapes or not (P = 0or P = 1), and a and b are the parameters that determines thedifficulty of the preys escaping from the predators. The closerthe prey and the predator, the harder the prey escapes fromthe predator. Moreover, a and b are shown as

a = xspan, b =100

xspan, (23)

where xspan is the span of the variable.

Page 4: A Predator-prey Particle Swarm Optimization …hbduan.buaa.edu.cn/papers/2015IEEE_CAA_JAS_HBDuan.pdfIEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 2, NO. 1, JANUARY 2015 11 A Predator-prey

14 IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 2, NO. 1, JANUARY 2015

B. Nash Equilibrium

As a competitive (non-cooperative) strategy of multi-objective multi-criterion system first proposed by Nash[29],Nash equilibrium is basically a local optimum: a strategyprofile (s1, s2, · · · , sn) such that no player can benefit fromswitching to a different strategy if nobody else switches,∀i,∀s′i ∈ Sj

UPj(s1, · · · ,si−1, si, si+1, · · · , sn) ≥

UPj(s1, · · · , si−1, s

′i, si+1, · · · , sn), (24)

where UPjdenotes the expected payment of person j, sj and

Sj respectively denote the ith strategy of player j and the setof strategies. Note that every dominant strategy equilibriumis a Nash equilibrium, but not vice versa. Every game hasone Nash equilibrium at least. In this paper, the expectedpayment is substituted by the objective function which isused to calculate the payoff matrixes denoted by MAm×n

and MBm×n. We define the vector of mixed strategies forboth sides as Xi = (xi1, · · · , xim, yi1, · · · , yin), such thatxik ≥ 0, yik ≥ 0,

∑mk=1 xik = 1,

∑nk=1 yik = 1, where

m = NRS and n = NBS . So, for each mixed strategy Xi, theNash equilibrium solution (X∗, Y ∗) must satisfy the givenconditions{

MA(Y ∗)′ ≥ MA(Xi (m + 1 : m + n))′

X∗MB ≥ Xi (1 : m) MB(25)

C. Proposed Approach for the Mixed Nash Equilibrium

For utilizing the proposed algorithm to compute Nashequilibrium here, we give the fitness functions as

f [Xdit] =

max1≤j≤m

{MR(j, :)(Xtdi,m+1:m+n)′−

Xtdi,1:mMR(Xt

di,m+1:m+n)′}+max

1≤j≤n{Xt

di,1:mMB(:, j)−Xt

di,1:mMB · (Xtdi,m+1:m+n)′}, (26)

f [Xrit] =

max1≤j≤m

{MR(j, :)(Xtri,m+1:m+n)′−

Xtri,1:mMR(Xt

ri,m+1:m+n)′}+max

1≤j≤n{Xt

ri,1:mMB(j, :)−Xt

ri,1:mMB(Xtri,m+1:m+n)′}. (27)

In the last two expressions, Xdi,1:m(k) means the mixedstrategies which are produced by the ith predator for theA force and the B force, respectively. Similarly, Xri,1:m(k)and Xri,m+1:m+n(k) denote the mixed strategies which areproduced by the ith prey for the A and B forces. Note thatthe proposed variables must satisfy the following conditions:

Xdi,j(k) ≥ 0, Xri,j(k) ≥ 0, (28)

m∑j=1

Xdi,j = 1,m+n∑

j=m+1

Xdi,j = 1,

m∑j=1

Xri,j = 1,m+n∑

j=m+1

Xri,j = 1.(29)

Importantly, the mixed Nash equilibrium corresponds tothe minimum of the fitness function and the optimal orthe sub-optimal solution will be the closest to zero. Thedetailed procedure of PP-PSO for the mixed Nash equilibriumcomputing is demonstrated in Fig. 1.

PROCEDURE Mixed Nash equilibrium computing based on the PP-PSO

BEGIN

Step 1: Initialize

Set m nMA

and m nMB

. Set the maximum iteration number maxN , the number of the

predators dm and the number of the preys rm . Randomly initialize the positions and

velocities of the predators dx and dv respectively, and both have the same dimensions

which are dm by m n . So are rx and rv .

Step 2:

(1) Let 1k ;

(2) Calculate the fitness value of all the particles in iteration k , and then find out the

minimum fitness value of the predators as !dpbest k , that of the preys as !rpbest k , and

that of all the particles as !pbest k .

Step 3:

(1) Let 1k k ! ;

(2)Update all the positions and the velocities according to (13) and (14). Then repeat (2) in

Step2.

Step 4: maxk N ?

(1)Yes: stop and output results;

(2)No: go to step3.

End

Fig. 1 Procedure of Nash equilibrium computing based on the PP-PSO.

To validate the effectiveness of the proposed method, herewe illustrated the Nash equilibrium computing both for zero-sum game and non-zero-sum game using two simple examples.For a fair comparison among these two method, they usethe same maximum iteration number Nmax = 100, the samepopulation size m = 30, and the same up and lower boundsfor inertia weights ωmax = 0.9, ωmin = 0.2. Besides, in ourproposed PP-PSO, the numbers of predators and preys are setmd = 10, mr = 20, respectively.

Example 1. Consider two-person, zero-sum game and non-zero-sum game illustrated by Tables I and II[30].

TABLE IIPAY-OFF MATRIX OF A AND B IN A TWO PLAYER,

NON-ZERO-SUM GAME

HHHHHA

BI II III IV

I (1, 1) (235, 0) (0, 235) (0.1, 1.1)

II (0, 235) (1, 1) (235, 0) (0.1, 1.1)

III (235, 0) (0, 235) (1, 1) (0.1, 1.1)

IV (1.1, 0.1) (1.1, 0.1) (1.1, 0.1) (0, 0)

Page 5: A Predator-prey Particle Swarm Optimization …hbduan.buaa.edu.cn/papers/2015IEEE_CAA_JAS_HBDuan.pdfIEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 2, NO. 1, JANUARY 2015 11 A Predator-prey

DUAN et al.: A PREDATOR-PREY PARTICLE SWARM OPTIMIZATION APPROACH TO MULTIPLE UCAV AIR COMBAT MODELED BY · · · 15

As we can see from the above two tables, the first columnand the first row represents the strategies of player A and B,respectively. For example, in the zero-sum game, each playerhas three strategies, which are specified by the number ofrows and the number of columns. The payoffs are providedin the interior. The first number is the payoff received by thecolumn player; the second is the payoff for the row player.To reduce statistical errors, each algorithm is tested 100 timesindependently for these two games. Evolution curves for thetwo-player, zero sum game are depicted in Figs. 2∼ 4. Besides,the simulation results are illustrated from the perspective ofaverage fitness value, best fitness value ever found (Tables IIIand IV), the minimum error and times that the results satisfiedthat error ≤ 0.01, where error is defined as the followingexpressions:

eh(k) =md∑i=1

‖Xdi,1:m+n(k)− E S‖+mr∑i=1

‖Xri,1:m+n(k)− E S‖md + mr

,

(30)

eb(k) =

mb∑i=1

‖Xi,1:mb(k)− E S‖

mb, (31)

Fig. 2. Comparison results of average fitness values for the twoplayer, zero-sum game.

Fig. 3. Comparison results of average errors for the two player, zero-sum game.

Fig. 4. Comparison results of global best solutions for the twoplayer, zero-sum game.

where eh(k) and eb(k) denote the error of the basic PSOand our proposed PP-PSO, E S represents the mixed Nashequilibrium solution of the game that the players participatein. Note that for the zero-sum game shown in Table I,E S =

[2152 , 12

52 , 1952 , 2

13 , 313 , 8

13

]. And for the non-zero-sum

game shown in Table II, E S =[13 , 1

3 , 13 , 0, 1

3 , 13 , 1

3 , 0].

TABLE IIICOMPARISON RESULTS FOR THE TWO PLAYER, ZERO-SUM

GAME

Average Best Minimum Successfulfitness fitness error hits

PP-PSO 0.2439 5.92E−4 0.0091 79Basic PSO 0.3153 0.0074 0.0145 2

TABLE IVCOMPARISON RESULTS FOR THE TWO PLAYER,

NON-ZERO-SUM GAME

Average Best Minimum Successfulfitness fitness error hits

PP-PSO 0.1352 3.836E−13 0.0069 91Basic PSO 0.6358 0.2897 0.0085 14

It is reasonable to conclude from the simple exampledemonstrated above that the proposed PP-PSO outperformsthe basic PSO in terms of solution accuracy, convergencespeed, and reliability for Nash equilibrium computing. So it isappropriate to use this method to solve the problem of multipleUCAV air combat modeled by dynamic game theory in thefollowing section.

IV. GAME THEORETIC APPROACH TO UCAV AIR COMBATBASED ON PP-PSO

A. Experimental Settings

To validate the effectiveness of the dynamic game theoreticformulation for UCAV air combat, a computational exampleis performed based on Matlab 2009b using our proposed PP-PSO. Consider an adversary scenario involving two opposingforces here. The attacking force is labeled as Red team, whilethe defending force is labeled as Blue team. The mission

Page 6: A Predator-prey Particle Swarm Optimization …hbduan.buaa.edu.cn/papers/2015IEEE_CAA_JAS_HBDuan.pdfIEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 2, NO. 1, JANUARY 2015 11 A Predator-prey

16 IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 2, NO. 1, JANUARY 2015

of the Blue force is transporting military supplements fromits base to the battlefront while the task of the Red force isattacking and destroying the aerotransports of the Blue forceat least 80 % and then returning to their air base.

As shown in Fig. 5, the Blue force consists of one trans-portation unit, which is represented by the solid square, andtwo combat units. They are on the way back to the base afteraccomplishing a military mission. The Red force consists ofthree combat units and aims to destroy the Blue transportationunit. The Red force is also programmed to return the baseafter the mission. For simplification of the problem, each unitof both sides is assumed to consist of the same type of UCAVs,and each UCAV is equipped with a certain number of air-to-air missiles. Assume that the speed of Red force is nearly0.25 km/s while the speed of Blue force is nearly 0.2 km/s,and the state variables will be updated every 2 minutes. Sothe positions of the Red force and the Blue force will change30 km and 24 km, respectively at each step. The configurationparameters used in the simulation for the Red force and theBlue force are listed in Table V and Table VI, respectively.

Fig. 5. Scenario of cooperative UCAV task assignment.

TABLE VINITIAL CONFIGURATION OF Red FORCE(xA

i (0), yAi (0)

)pA

i (0) wAi (0) PKa−bi rmA

UAV1 (200, 300) km 10 4 0.620 kmUAV2 (201, 299) km 10 4 0.4

UAV3 (202, 298) km 10 4 0.4

TABLE VIINITIAL CONFIGURATION OF Blue FORCE

(xB

i (0), yBi (0)

)pB

i (0) wBi (0) PKb−ai rmB

UAV1 (2, 2) km 10 0 020 kmUAV2 (3, 2) km 7 3 0.7

UAV3 (4, 2) km 7 3 0.7

In the simulation, the objective functions of the two forcesare chosen as

JR(k) =3∑

i=1

0.4pi

R(k)pi

R(0)− p1

B(k)p1

B(0)−

3∑

i=2

0.3pi

B(k)pi

B(0), (32)

JB(k) = −3∑

i=1

0.6pi

R(k)pi

R(0)+ 0.5

p1B(k)

p1B(0)

+3∑

i=2

0.5pi

B(k)pi

B(0).

(33)

B. Experimental Results and Analysis

Fig. 6 presents the flying trajectories for both sides in theair military operations, which result from the proposed gametheoretic formulation of task assignment in a dynamic combatenvironment and the PP-PSO based solution methodology.The Red force starts from near its base and launches attacksto eliminate the Blue transportation force, which is on theway returning to its base after a military mission. The taskassignment scheme for both sides are calculated based on theproposed approach described above.

Fig. 6. Resulting trajectories of both sides from the proposed ap-proach.

The detailed evolution and convergence behavior with timeof platform numbers in combating units are shown in Fig. 7.The combating units of both sides start to fight at the 9thtime step. After 3 time steps of engagement, this air militaryoperation ends up at the 11th time step, with Red defeatingthe Blue force and the surviving forces of both sides returningto their own bases. At the end of the combat, the Red forcemanages to inflict more than 90 % of the platforms in Blue′stransportation unit, 78 % of platforms in B2, and 70 % ofplatforms in B3. Meanwhile, the Red team pays a price forthe victory. The first unit R1 and the third one R3 suffer aslight damage and 30 % platforms are destroyed in the attack.However, the second unit R2 of the Red force suffer a seriousdamage, with more than 60 % of the platforms being destroyedin the engagement with the Blue force. The result can beexplained from Fig. 8, where snapshots of the dynamic taskassignment results at Steps 9, 10, and 11 are given. It illustratesthat the engagement of both sides has a different pattern ateach time step and proves the task assignment process to be adynamic process with time. At the 9th and 10th time steps, thesecond unit R2 of the Red force takes actions in accordancewith the resulting mixed Nash equilibrium and chooses the

Page 7: A Predator-prey Particle Swarm Optimization …hbduan.buaa.edu.cn/papers/2015IEEE_CAA_JAS_HBDuan.pdfIEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 2, NO. 1, JANUARY 2015 11 A Predator-prey

DUAN et al.: A PREDATOR-PREY PARTICLE SWARM OPTIMIZATION APPROACH TO MULTIPLE UCAV AIR COMBAT MODELED BY · · · 17

(a) Platforms in R1 (b) Platforms in R2

(c) Platforms in R3 (d) Platforms in transportation unit of Blue force

(e) Platforms in B2 (f) Platforms in B3Fig. 7. Number of platforms.

first and second units of Blue force as targets, respectively.However, the Blue force insists on attacking R2 by the firstunit B1, which is the most powerful unit of its 10 combatplatforms. Consequently, R2 suffers the most serious damagein the three units of Red.

It is important to note that both the attacking side andthe defense side take advantage of the proposed approach toacquire the assignment scheme over the engagement duration.Therefore, the engagement outcome mainly depends on theinitial configuration of each force. The Red force has anadvantage in performances and numbers of weapons, which

Fig. 8. Snapshots of dynamic task assignment results.

Page 8: A Predator-prey Particle Swarm Optimization …hbduan.buaa.edu.cn/papers/2015IEEE_CAA_JAS_HBDuan.pdfIEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 2, NO. 1, JANUARY 2015 11 A Predator-prey

18 IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 2, NO. 1, JANUARY 2015

implies the possible result of Blue′s defeat. The experimentalresults are coincident with the theoretical analysis and verifythe effectiveness and feasibility of the proposed approach.

V. CONCLUSIONS

This paper developed a game theoretic method for UCAVcombat, which is based on the PP-PSO model. By consideringboth the adversary side and the attacking side as rationalgame participants, we represented the task allocation schemeas an optional policy set of both sides, and the cooperativetask allocation results of both sides were achieved by solvingthe mixed Nash equilibrium using PP-PSO. An example ofmilitary operation involving an attacking side Red and adefense side Blue was presented to verify the effectivenessand adaptive ability of the proposed method. Simulation resultsshow that the combination of game theoretic representationof the task assignment and the application of PP-PSO for themixed Nash solutions can effectively solve the UCAV dynamictask assignment problem involving an adversary opponent.

REFERENCES

[1] Richards A, Bellingham J, Tillerson M, How J. Coordination and controlof multiple UAVs. In: Proceedings of the 2002 AIAA Guidance, Navi-gation, and Control Conference. Monterey, CA: AIAA, 2002. 145−146

[2] Alighanbari M, Kuwata Y, How J P. Coordination and control of multipleUAVs with timing constraints and loitering. In: Proceedings of the2003 American Control Conference. Denver, Colorado: IEEE, 2003.5311−5316

[3] Li C S, Wang Y Z. Protocol design for output consensus of port-controlled Hamiltonian multi-agent systems. Acta Automatica Sinica,2014, 40(3): 415−422

[4] Duan H, Li P. Bio-inspired Computation in Unmanned Aerial Vehicles.Berlin: Springer-Verlag, 2014. 143−181

[5] Duan H, Shao S, Su B, Zhang L. New development thoughts on the bio-inspired intelligence based control for unmanned combat aerial vehicle.Science China Technological Sciences, 2010, 53(8): 2025−2031

[6] Chi P, Chen Z J, Zhou R. Autonomous decision-making of UAV basedon extended situation assessment. In: Proceedings of the 2006 AIAAGuidance, Navigation, and Control Conference and Exhibit. Colorado,USA: AIAA, 2006.

[7] Ruz J J, Arelo O, Pajares G, de la Cruz J M. Decision making amongalternative routes for uavs in dynamic environments. In: Proceedingsof the 2007 IEEE Conference on Emerging Technologies and FactoryAutomation. Patras: IEEE, 2007. 997−1004

[8] Jung S, Ariyur K B. Enabling operational autonomy for unmanned aerialvehicles with scalability. Journal of Aerospace Information Systems,2013, 10(11): 516−529

[9] Berger J, Boukhtouta A, Benmoussa A, Kettani O. A new mixed-integerlinear programming model for rescue path planning in uncertain adver-sarial environment. Computers & Operations Research, 2012, 39(12):3420−3430

[10] Duan H B, Liu S. Unmanned air/ground vehicles heterogeneous coop-erative techniques: current status and prospects. Science China Techno-logical Sciences, 2010, 53(5): 1349−1355

[11] Cruz Jr J B, Simaan M A, Gacic A, Jiang H, Letelliier B, Li M, LiuY. Game-theoretic modeling and control of a military air operation.IEEE Transactions on Aerospace and Electronic Systems, 2001, 37(4):1393−1405

[12] Dixon W. Optimal adaptive control and differential games by reinforce-ment learning principles. Journal of Guidance, Control, and Dynamics,2014, 37(3): 1048−1049

[13] Semsar-Kazerooni E, Khorasani K. Multi-agent team cooperation: agame theory approach. Automatica, 2009, 45(10): 2205−2213

[14] Gu D. A game theory approach to target tracking in sensor networks.IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernet-ics, 2011, 41(1): 2−13

[15] Duan H, Wei X, Dong Z. Multiple UCAVs cooperative air combatsimulation platform based on PSO, ACO, and game theory. IEEEAerospace and Electronic Systems Magazine, 2013, 28(11): 12−19

[16] Turetsky V, Shinar J. Missile guidance laws based on pursuit-evasiongame formulations. Automatica, 2003, 39(4): 607−618

[17] Porter R, Nudelman E, Shoham Y. Simple search methods for find-ing a Nash equilibrium. Games and Economic Behavior, 2008, 63(2):642−662

[18] Chen X, Deng X, Teng S-H. Settling the complexity of computing two-player Nash equilibria. Journal of the ACM, 2009, 56(3): Article No.14

[19] Kennedy J, Eberhart R. Particle swarm optimization. In: Proceedingsof the 1st IEEE International Conference on Neural Networks. Perth,Australia: IEEE, 1995. 1942−1948

[20] Eberhart R, Kennedy J. A new optimizer using particle swarm theory.In: Proceedings of the 6th International Symposium on Micro Machineand Human Science. Nagoya: IEEE, 1995. 39−43

[21] Higashitani M, Ishigame A, Yasuda K. Particle swarm optimizationconsidering the concept of predator-prey behavior. In: Proceedings ofthe 2006 IEEE Congress on Evolutionary Computation. Vancouver, BC,Canada: IEEE, 2006. 434−437

[22] Liu F, Duan H B, Deng Y M. A chaotic quantum-behaved particleswarm optimization based on lateral inhibition for image matching.Optik-International Journal for Light and Electron Optics, 2012, 123(21):1955−1960

[23] Edison E, Shima T. Genetic algorithm for cooperative UAV task as-signment and path optimization. In: Proceedings of the 2008 AIAAGuidance, Navigation and Control Conference and Exhibit. Honolulu,Hawaii: AIAA, 2008. 340−356

[24] Duan H, Luo Q, Shi Y, Ma G. Hybrid particle swarm optimizationand genetic algorithm for multi-UAV formation reconfiguration. IEEEComputational Intelligence Magazine, 2013, 8(3): 16−27

[25] Liu G, Lao S Y, Tan D F, Zhou Z C. Research status and progress onanti-ship missile path planning. Acta Automatica Sinica, 2013, 39(4):347−359

[26] Duan H B, Yu Y X, Zhao Z Y. Parameters identification of UCAVflight control system based on predator-prey particle swarm optimization.Science China Information Sciences, 2013, 56(1): 1−12

[27] Duan H, Li S, Shi Y. Predator-prey based brain storm optimization forDC brushless motor. IEEE Transactions on Magnetics, 2013, 49(10):5336−5340

[28] Pan F, Li X T, Zhou Q, Li W X, Gao Q. Analysis of standard particleswarm optimization algorithm based on Markov chain. Acta AutomaticaSinica, 2013, 39(4): 381−389

[29] Nash J F. Equilibrium points in n-person games. Proceedings of theNational Academy of Sciences of the United States of America, 1950,36(1): 48−49

[30] Yu Qian, Wang Xian-Jia. Evolutionary algorithm for solving Nashequilibrium based on particle swarm optimization. Journal of WuhanUniversity (Natural Science Edition), 2006, 52(1): 25−29 (in Chinese)

Haibin Duan Professor at the School of Automa-tion Science and Electrical Engineering, BeihangUniversity, China. He received his Ph. D. degreefrom Nanjing University of Aeronautics and Astro-nautics in 2005. His is the Head of Bio-inspired Au-tonomous Flight Systems (BAFS) Research Group.His research interests include multiple UAVs coop-erative control, biological computer vision and bio-inspired computation. Corresponding author of thispaper.

Pei Li Ph. D. candidate at the School of Automa-tion Science and Electrical Engineering, BeihangUniversity, China. He received his bachelor degreefrom Harbin Engineering University in 2012. He is amember of BUAA Bio-inspired Autonomous FlightSystems (BAFS) Research Group. His research in-terests include multiple UAV cooperative control andgame theory.

Yaxiang Yu Master student at the School ofAutomation Science and Electrical Engineering, Bei-hang University, China. She received her bachelordegree from Beihang University in 2007. She wasonce a technician at the Changhe Aircraft IndustriesGroup Co., Ltd. from 2007 to 2008. She is a memberof BUAA Bio-inspired Autonomous Flight Systems(BAFS) Research Group. Her research interests in-clude multiple UAV cooperative control and bio-inspired computation.