A Stochastic Pursuit-Evasion Game with no Information Sharing

1

A Stochastic Pursuit-A Stochastic Pursuit-Evasion Game with no Evasion Game with no Information SharingInformation Sharing

Ashitosh SwarupAshitosh SwarupJason SpeyerJason Speyer

Johnathan WolfeJohnathan WolfeSchool of Engineering and Applied School of Engineering and Applied

ScienceScienceUCLAUCLA

2

IntroductionIntroduction The game considered here is the LQG The game considered here is the LQG

stochastic pursuit-evasion game.stochastic pursuit-evasion game. Deterministic version of this game was Deterministic version of this game was

studied by Ho, Bryson and Baron.studied by Ho, Bryson and Baron. The case in which both players process The case in which both players process

their own noisy measurements was their own noisy measurements was studied by Willman.studied by Willman.

We continue investigating this class of We continue investigating this class of games. games.

3

Willman’s ApproachWillman’s Approach Attempted to find strategies in which Attempted to find strategies in which

each player’s control is an assumed each player’s control is an assumed linear function of his entire observation linear function of his entire observation history.history.

Optimizing the cost function resulted in Optimizing the cost function resulted in a set of implicit equations for the a set of implicit equations for the control gains. control gains.

No closed form solution shown for No closed form solution shown for implicit equations; results were implicit equations; results were obtained numerically for up to 3 stages. obtained numerically for up to 3 stages.

4

Our ObjectiveOur Objective Examine conditions under which closed Examine conditions under which closed

form linear and/or nonlinear optimal form linear and/or nonlinear optimal solutions exist.solutions exist. Willman sets up an LQG problem and states Willman sets up an LQG problem and states

an optimality result without proof. We use an optimality result without proof. We use dynamic programming to derive conditions dynamic programming to derive conditions for optimal controllers. for optimal controllers.

If possible, eliminate the need to smooth If possible, eliminate the need to smooth over each player’s entire observation over each player’s entire observation sequence (dimensionality constraint). sequence (dimensionality constraint).

5

Problem SetupProblem Setup System Dynamics given by:System Dynamics given by:

x(i+1)=x(i)+Gx(i+1)=x(i)+Gppu(i)-Gu(i)-Geev(i)+q(i)v(i)+q(i) Subscripts p and e refer to pursuer and Subscripts p and e refer to pursuer and

evader respectively.evader respectively. The pursuer’s and opponent’s controls are The pursuer’s and opponent’s controls are

u and v respectively. u and v respectively. q is Gaussian white, (0,Q), x(0) is q is Gaussian white, (0,Q), x(0) is

Gaussian, (xGaussian, (x00,P,P00), statistics of q and x(0) a ), statistics of q and x(0) a priori known to both players.priori known to both players.

6

Problem Setup (contd.)Problem Setup (contd.)The players receive noisy measurements:The players receive noisy measurements:

zzpp(i)=H(i)=Hppx(i)+wx(i)+wpp(i)(i)zzee(i)=H(i)=Heex(i)+wx(i)+wee(i)(i)

Each player has no information about his Each player has no information about his opponent’s observation, but knows his opponent’s observation, but knows his opponent’s noise statistics.opponent’s noise statistics.

wwpp Gaussian white, (0,R Gaussian white, (0,Rpp).). wwee Gaussian white, (0,R Gaussian white, (0,Ree).). Both players start off with common a Both players start off with common a

priori estimate of the initial state x(0).priori estimate of the initial state x(0).

7

Problem Setup (contd.)Problem Setup (contd.) Observation Histories:Observation Histories:

ZZpp(i)=(i)=ff z zpp(j), j=0,..,i (j), j=0,..,i ggZZee(i)=(i)=ff z zee(j), j=0,..,i (j), j=0,..,i gg

Cost function:Cost function:J(u,v)=E [[SJ(u,v)=E [[Sffx(n),x(n)]+x(n),x(n)]+00

n-1n-1([Bu(i),u(i)]-[Cv(i),v(i)])]([Bu(i),u(i)]-[Cv(i),v(i)])]

Pursuer minimizes the cost function Pursuer minimizes the cost function while evader maximizes.while evader maximizes.

8

Saddle Point ConditionSaddle Point Condition Finding optimal controls involves Finding optimal controls involves

solving the following saddle-point solving the following saddle-point inequality:inequality:

J(u,vJ(u,voo) ) ¸̧ J(u J(uoo,v,voo) ) ¸̧ J(u J(uoo,v),v)

Optimize person-by-person by Optimize person-by-person by solving the following inequalities:solving the following inequalities:

J(uJ(uoo,v,voo) ) ¸̧ J(u J(uoo,v),v)J(u,vJ(u,voo) ) ¸̧ J(u J(uoo,v,voo))

9

The One-Stage GameThe One-Stage Game Cost function:Cost function:

J(u,v)=E [[SJ(u,v)=E [[Sffx(1),x(1)]+[Bu(0),u(0)]-[Cv(0),v(0)]]x(1),x(1)]+[Bu(0),u(0)]-[Cv(0),v(0)]] Optimize to get expressions for uOptimize to get expressions for uoo(0) and v(0) and voo(0).(0). Assume a linear functional form of the Assume a linear functional form of the

controls:controls:uuoo(0)=(0)=uu++uuxx00++uuzzpp(0)(0)vvoo(0)=(0)=vv++vvxx00++vvzzee(0)(0)

Solving for the coefficients using the equations Solving for the coefficients using the equations derived previously gives derived previously gives uu==vv=0, and nonzero =0, and nonzero values for the other matrix gains.values for the other matrix gains.

An assumed nonlinear form of the optimal An assumed nonlinear form of the optimal controls degenerates into the above linear controls degenerates into the above linear controllers.controllers.

10

The Two Stage GameThe Two Stage Game The cost function in this case isThe cost function in this case is

JJ11(u,v)=E[[S(u,v)=E[[Sffx(2),x(2)]+x(2),x(2)]+0011[B[Biiu(i),u(i)]-[Cu(i),u(i)]-[Ciiv(i),v(i)]]v(i),v(i)]]

Assume a linear form of the controls:Assume a linear form of the controls:uuoo(0)=k(0)=k00+k+k0000xx00+k+k00

00zzpp(0);(0); v voo(0)=l(0)=l00+l+l0000xx00+l+l0000zzee(0)(0)

uuoo(1)=k(1)=k11+k+k0011xx00+k+k1100zzpp(0)+k(0)+k11

11zzpp(1) (1) vvoo(1)=l(1)=l11+l+l0011xx00+l+l11

00zzee(0)+l(0)+l1111zzee(1)(1)

Optimize cost function using dynamic Optimize cost function using dynamic programming to get expressions for uprogramming to get expressions for uoo(0), v(0), voo(0), (0), uuoo(1) and v(1) and voo(1).(1).

Use the expressions derived for the optimal Use the expressions derived for the optimal controls to get 14 equations for the 14 unknown controls to get 14 equations for the 14 unknown control-coefficient matrices. control-coefficient matrices.

11

The Two Stage ProblemThe Two Stage ProblemAnalytical ConstraintAnalytical Constraint

Solving the equations for the control Solving the equations for the control gains involves inverting a matrix gains involves inverting a matrix with unknown elements.with unknown elements.

Results in polynomial equations in Results in polynomial equations in the unknowns. the unknowns.

Consider the scalar case first to Consider the scalar case first to extract properties of the system. extract properties of the system.

12

The Two Stage GameThe Two Stage GameProperties of the Scalar Properties of the Scalar

EquationsEquations kk00

00, l, l0000, k, k00

11, l, l0011, k, k11

11 and l and l1111 are mutually are mutually

dependent and do not depend on the other dependent and do not depend on the other variables.variables.

This reduces the number of equations we This reduces the number of equations we have to solve simultaneously from 14 to 6.have to solve simultaneously from 14 to 6.

The other variables kThe other variables k00, l, l00, k, k0000, l, l0000, k, k11, l, l11, k, k0011 and land l0011 depend on the above 6 variables, depend on the above 6 variables, and can be solved for after solving the and can be solved for after solving the above 6 equations. above 6 equations.

13

The Two Stage GameThe Two Stage GameSolving the Scalar EquationsSolving the Scalar Equations

kk0000 and l and l00

00 can be eliminated by solving: can be eliminated by solving:kk00

00==pp(k(kp1p1+k+kp2p2ll0000))

ll0000==ee(k(ke1e1+k+ke2e2kk00

00))

pp, , ee, k, kp1p1, k, kp1p1, k, ke1e1, k, ke2e2 and l and le2e2 are functions are functions of kof k00

11, l, l0011, k, k11

11 and l and l1111..

We thus need to solve 4 equations for the We thus need to solve 4 equations for the 4 variables from the final stage. 4 variables from the final stage.

14

The Two Stage GameThe Two Stage GameSolving the Scalar Equations Solving the Scalar Equations

(contd.)(contd.) As we go on to the final stage, we encounter As we go on to the final stage, we encounter

polynomial equations of the form:polynomial equations of the form:kk00

11=f=fpp(l(l0011, k, k11

11, l, l1111))

ll0011=f=fee(k(k00

11, l, l1111, k, k11

11))

Eliminate kEliminate k0011 and l and l00

11 from these equations and go on from these equations and go on to solve the pair of equations for kto solve the pair of equations for k11

11 and l and l1111..

Back-substitute values of kBack-substitute values of k1111 and l and l11

11 into previous into previous equations to solve for remaining 4 variables. equations to solve for remaining 4 variables.

We thus have a dynamic programming kind of We thus have a dynamic programming kind of approach for these 6 variables i.e. solve for variables approach for these 6 variables i.e. solve for variables from the final stage first and then solve for from the final stage first and then solve for subsequent stages. subsequent stages.

15

Conclusion and Future WorkConclusion and Future Work

Even seemingly simple linear structures Even seemingly simple linear structures result in complex polynomial equations.result in complex polynomial equations.

If analytical linear solutions exist in the If analytical linear solutions exist in the scalar case, do nonlinear solutions exist?scalar case, do nonlinear solutions exist?

Is it possible to find analytical closed Is it possible to find analytical closed form solutions for the vector case?form solutions for the vector case?

Can the need to smooth over the entire Can the need to smooth over the entire observation sequence be eliminated?observation sequence be eliminated?

Documents

A Stochastic Pursuit-Evasion Game with no Information Sharing