An Introduction to Numerical Methods for Differential Games · An Introduction to Numerical Methods for Di erential Games M. Falcone ... applications of this theory to industrial

An Introduction to Numerical Methodsfor Differential Games

M. FalconeDipartimento di Matematica

School ”Game Theory and Dynamic Games”

Campione, September 4-9 , 2016

M. Falcone (SAPIENZA, Rome) Introduction to Numerical Methods for DG 1 / 51

Outline

1 Introduction

2 Dynamic Programming for 1-Player

3 Dynamic Programming for 2-Players


Outline

1 Introduction




Foreword

Optimal control problem can be solved via the Pontryagin MaximumPrinciple (open-loop) and by the Dynamic Programming (DP) approach.

However Pontryagin’s Principle does not work for games, so we presentthe DP approach.

By the Dynamic Programming Principle, we will derive thecharacterization of the value function in terms of a first order partialdifferential equation (PDE), the Isaacs equation.We will introduce weak solutions (i.e. non differentiable) which will allowto select a unique solution to the Bellman and to the Isaacs equation.

The general framework of this approach is the theory of viscosity solutions(Crandall-Lions, 1984).

Uniqueness is a delicate and fundamental issue and is crucial to proveconvergence of numerical approximation schemes.


The numerical solution of optimal control problems via the DynamicProgramming approach is mainly motivated by the search for feedbackcontrols for generic nonlinear Lipschitz continuous dynamics and costs.

The solution of the corresponding Bellman equation in high dimension is acomputationally intensive task and this bottleneck has limited theapplications of this theory to industrial cases.

Recently several new efficient algorithms with limited (and controlled)memory allocations and reasonable CPU times have been developed tomitigate the ”curse of dimension”.


The Zermelo navigation problem

1

0.8

0.6

0.4

0.2

0

0.2

0.4

0.6

0.8

1

10.8

0.60.4

0.20

0.20.4

0.60.8

1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1

0.8

0.6

0.4

0.2

0

0.2

0.4

0.6

0.8

1

10.8

0.60.4

0.20

0.20.4

0.60.8

1

0

1

2

3

4

5

6

Figure: The rescaled value function (left), feedback control (right)


Trajectories for a pursuit-evasion game

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1

x2

x1

Test 5: P=(0.0,0.3) E=(0.3,0.0)

P

E

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1x2

x1

Test 5: P=(-0.1,-0.3) E=(0.1,0.3)

P

E

Figure: Optimal trajectories derived by feedback laws


The model problem

Let us consider the nonlinear system dynamicsy(t) = f (y(t), a(t), b(t)), t > 0,y(0) = x

(D)

wherey(t) ∈ RN is the statea( · ) ∈ A is the control of player 1 (player a)

A = admissible control functions of player 1

= a : [0,+∞[→ A, measurable

(e.g. A = piecewise constant functions with values in A),


The model problem

b( · ) ∈ B is the control of player 2 (player b),

B = b : [0,+∞[→ B, measurable ,

A,B ⊂ RM are given compact sets.Assume f is continuous and

|f (x , a, b)− f (y , a, b)| ≤ L |x − y | ∀x , y ∈ RN , a ∈ A, b ∈ B.

Then, for every given control strategies a( · ) ∈ A, b( · ) ∈ B, there is aunique trajectory of (D), denoted by yx(t; a, b) (Caratheodory).


Payoff

The payoff of the game is

tx(a( · ), b( · )) = min t : yx(t; a, b) ∈ T ≤ +∞,

where T ⊆ RN is a given closed target .

Goal of the gamePlayer a wants to minimize the payoff, he is called the pursuer, whereasPlayer b wants to maximize the payoff, he is called the evader.


Example: Minimum time problem

This is a simple example with just 1 playery = a, A = a ∈ RN : |a| = 1 ,y(0) = x .

Then, tx(a∗) is equal to the length of the optimal trajectory joining x andthe point yx(tx(a∗), thus

tx(a∗) = mina∈A

tx(a) = dist(x , T )

and any optimal trajectory is a straight line!


Example: Pursuit-Evasion Games

We have two players, each one controlling its own dynamicsy1 = f1(y1, a), yi ∈ RN/2, i = 1, 2y2 = f2(y2, b)

(PEG)

The target is

Tδ ≡ |y1 − y2| ≤ δ , δ > 0, or T0 ≡ (y1, y2) : y1 = y2 .

Then, tx(a( · ), b( · )) is the capture time corresponding to the strategiesa(·) and b(·) of the first and second players.


Dynamic Programming for 1-Player

In this section we assume B = b , so the system can be rewritten asy = f (y , a), t > 0,y(0) = x .

Define the value function

T (x) ≡ infa(·)∈A

tx(a) .

T ( · ) is the minimum-time function,i.e. it is the best possible outcome ofthe game for player a, as a function of the initial position x of the system.


Reachable set

DEFINITIONR ≡ x ∈ RN : T (x) < +∞, i.e. the set of starting points from which itis possible to reach the target.

WARNINGThe reachable set R depends on the target and on the dynamics in arather complicated way.

R is NOT known in our problem, so we have to determine the couple(T ,R) (i.e. it’s a free boundary problem).


Outline

1 Introduction




Dynamic Programming for 1-Player

LEMMA (Dynamic Programming Principle)For all x ∈ R \ T , 0 ≤ t < T (x) ,

T (x) = infa( · )∈A

t + T (yx(t; a)) . (DPP)

Sketch of the ProofThe inequality “≤” follows from the intuitive fact that ∀a( · )

T (x) ≤ t + T (yx(t; a)).


Sketch of the proof

The proof of the opposite inequality “≥” is based on the fact that theequality holds if a( · ) is optimal for x .For any ε > 0 we can find a minimizing control aε such that

T (x) + ε ≥ t + T (yx(t; aε)

split the trajectory and pass to the limit for ε→ 0.


Sketch of the proof

To prove rigorously the above inequalities the following two properties ofA are crucial:

1 a( · ) ∈ A implies that ∀s ∈ R the function t 7→ a(t + s) is still in A;

2 a1, a2 ∈ A implies that for any given s > 0 the new control

a(t) ≡

a1(t) t ≤ s,a2(t) t > s.

belongs to A (concatenation property)


Concatenation is a crucial property

Note that the DP Principle works for

A = piecewise constants functions into A

but not forA = continuous functions into A .

because joining together two continuous controls we are not guaranteedthat the resulting control is continuous.


Getting the Bellman equation

Let us derive the Hamilton-Jacobi-Bellman equation from the DPPrinciple.Rewrite (DPP) as

T (x)− infa( · )∈A

T (yx(t; a)) = t

and divide by t > 0,

supa( · )∈A

T (x)− T (yx(t; a))

t

= 1 for t < T (x) .

We want to pass to the limit as t → 0+.


Bellman equation

Assume T is differentiable at x and that the limit for t → 0+ commuteswith the supa( · ).Then, if yx(0; a) exists,

supa( · )∈A

−∇T (x) · yx(0, a) = 1.

Then, for limt→0+

a(t) = a0, we get

supa0∈A−∇T (x) · f (x , a0) = 1 . (HJB)

This is the Hamilton-Jacobi-Bellman partial differential equation, for ourproblem is a first order nonlinear PDE.


Bellman equation

Let us define the Hamiltonian,

H1(x , p) := maxa∈A−p · f (x , a) − 1,

we can rewrite (HJB) in short as

H1(x ,∇T (x)) = 0 in R \ T .

A natural boundary condition on ∂T is

T (x) = 0, for x ∈ ∂T


T verifies the HJB equation

PROPOSITIONIf T ( · ) is C 1 in a neighborhood of x ∈ R \ T , then T ( · ) satisfies forevery x

maxa∈A−∇T (x) · f (x , a) = 1 .

Sketch of the proofLet us prove first the inequality “≤”.Fix a(t) ≡ a0 ∀t, and set yx(t) = yx(t; a). The (DPP) gives

T (x)− T (yx(t)) ≤ t ∀ 0 ≤ t < T (x).



We divide by t > 0, getting

T (x)− T (yx(t))

t≤ 1, ∀ 0 < t < T (x).

Now let t → 0+ to get

−∇T (x) · yx(0) = −∇T (x) · f (x , a0) ≤ 1,

where yx(0) = f (x , a0) since a(t) tends to a0. Then, we get

maxa∈A−∇T (x) · f (x , a) ≤ 1 .



To prove the inequality “≥”, we fix ε > 0.For all t ∈ ]0,T (x)[, by (DPP) there exists α ∈ A such that

T (x) ≥ t + T (yx(t;α))− εt .

Then

1− ε ≤ T (x)− T (yx(t;α))

t≤ −1

t

∫ t

0

∂

∂sT (yx(s;α)) ds

= −1

t

∫ t

0∇T (yx(s)) · yx(s) ds = −1

t

∫ t

0∇T (x) · f (x , α(s)) ds

Passing to the limit for t → 0+ we get for every positive ε

1− ε ≤ −∇T (x) · f (x , a0) for a0 ∈ A



Since ε is abitrary, we finally obtain

supa∈A−∇T (x) · f (x , a) ≥ 1 .

and we conclude the proof.

In conclusion:assuming that T is a differentiable function, we have proved that Tsatisfies pointwise the Bellman equation in the reachable set R.

WARNING: the reachable R is not given in the problem.


Is T regular?

The answer is NO even for simple cases.

Let us go back to Example 1 where T (x) = dist(x , T ). Note that T is notdifferentiable at x if there exist two distinct points of minimal distance.

EXAMPLELet us take N = 1, f (x , a) = a, A = B(0, 1) and choose

T = ]−∞,−1] ∪ [1,+∞[ .

Then,T (x) = 1− |x |

which is not differentiable at x = 0.


a.e. solutions

Note that in this example the Bellman equation is the eikonal equation

|Du(x)| = 1 (1)

which has infinitely many a.e. solutions also when we fix the values on theboundary ∂T ,

u(−1) = u(1) = 0

−1 1

u(x) = 1− |x|

x


Is T continuous?

Also the continuity of T is, in general, not guarateed.Take the previous example and set A = [−1, 0], then we have

T (1) = 0 limx→1

T (x) = 2

However, the continuity of T ( · ) is equivalent to the property ofSmall-Time Local Controllability (STLC) around T .


Small Time Local Controllability (STLC)

DEFINITIONAssume ∂T smooth. Then the dynamical system satisties the STLCcondition iff

∀x ∈ ∂T ∃a ∈ A : f (x , a) · η(x) < 0 (STLC)

Note that (STLC) guarantees that R is an open subset of RN and that

limx→x0

T (x) = +∞, ∀x0 ∈ ∂R


Weak solutions (in the viscosity sense)

We want to interpret the HJB equation in a “weak sense” so that T ( · ) isa “solution” (non-classical), and is also unique (under suitable boundaryconditions).

Let’s go back to the proof of our proposition.

We proved that

1 T (x)− T (yx(t)) ≤ t, ∀t small and T (·) ∈ C 1(R) implies

H(x ,∇T (x)) ≤ 0

2 T (x)− T (yx(t)) ≥ t(1− ε), ∀t, ε small and T ∈ C 1(R) implies

H(x ,∇T (x)) ≥ 0

.



MAIN IDEA : If φ ∈ C 1 and T − φ has a maximum at x then

T (x)− φ(x) ≥ T (yx(t))− φ(yx(t)) ∀t,

thusφ(x)− φ(yx(t)) ≤ T (x)− T (yx(t)) ∀t,

so we can replace T by φ in the proof of proposition and get

H(x ,∇φ(x)) ≤ 0 .



Similarly, if φ ∈ C 1 and T − φ has a minimum at x , then

T (x)− φ(x) ≤ T (yx(t))− φ(yx(t)), ∀t.

thusφ(x)− φ(yx(t)) ≥ T (x)− T (yx(t)) ≥ t(1− ε)

and by the proof of the proposition

H(x ,∇φ(x)) ≥ 0 .

Thus, the classical proof can be fixed also for T /∈ C 1(R) just replacing Twith a “test function” φ ∈ C 1(R).


Viscosity solutions

DEFINITION (Crandall-Evans-Lions, 1985)Let H : RN × R× RN → R be continuous, Ω ⊆ RN open.We say that u ∈ C (Ω) is a viscosity subsolution of

H(x , u,∇u) = 0 in Ω (HJB)

if ∀φ ∈ C 1, ∀x local maximum point of u − φ,

H(x , u(x),∇φ(x)) ≤ 0.

It is a viscosity supersolution of (HJB) if ∀φ ∈ C 1, ∀x local minimumpoint of u − φ,

H(x , u(x),∇φ(x)) ≥ 0.

A viscosity solution is a sub- and supersolution in Ω.


Viscosity solutions

THEOREMIf R \ T is open and T ( · ) is continuous, then T ( · ) is a viscosity solutionof the Hamilton-Jacobi-Bellman equation (HJB).

The proof is the argument before the definition! Note that the definition islocal (since the definition must be satisfied only at some points).


Relation with classical solutions

PROPOSITION

1 If u is a classical solution of H(x , u,∇u) = 0 in Ω then u is a viscositysolution;

2 if u is a viscosity solution of H(x , u,∇u) = 0 in Ω and if u isdifferentiable at x0 then the equation is satisfied in the classical senseat x0, i.e.

H(x0, u(x0),∇u(x0)) = 0 .


Uniqueness

Next we want to prove the uniqueness of the solution for the Dirichletboundary value problem

u + H(x ,∇u) = 0 in Ωu = g on ∂Ω

(BVP)

under assumptions satisfied by the Hamiltonian H1.T (·) can be recovered from the solution of a boundary value problem asfollows.


Uniqueness

We rescale T by the Kruzkov change of variable

V (x) :=

1− e−T (x) if T (x) < +∞, i.e. x ∈ R1 if T (x) = +∞, (x /∈ R)

= infa( · )∈A

J(x , a)

where

J(x , a) :=

∫ tx (a)

0e−t dt .


Uniqueness

Note that, by definition, we have

∇V (x) = e−T (x)∇T (x)

and 1− V (x) = e−T (x), which implies

∇T (x) =∇V (x)

1− V (x)

Then, substituting into the equation for the minimum fime,we get the new equation for v

V (x) + maxa∈A

[−f (x , a) · ∇V (x)] = 1


Solving the free boundary problem

From V we can obtain the minimum time function T and the reachableset R by

T (x) = − log(1− V (x))

R = x : V (x) < 1 .This is quite important to solve the free boundary problem as well as forthe numerical approximation.In fact, since V takes values in [0,1] is computable.


Uniqueness

Moreover, by the DP Principle, V satisfiesV + max

a∈A−∇V · f (x , a)− 1 = 0 in RN \ T

V = 0 on ∂T ,(BVP-B)

which is a special case of (BVP), with

H(x , u, p) = H1(x , u, p) ≡ u + maxa∈A−f (x , a) · p − 1

Ω = T c ≡ RN \ T .


Uniqueness

LEMMAThe Mininimum Time Hamiltonian H1 satisfies the“structure condition”

|H(x , p)− H(y , q)| ≤ K (1 + |x |)|p − q|+ |q| L |x − y | (SH)

for any x , y , p, q.

THEOREM (Crandall-Lions, 1984)Assume H satisfies (SH). Let u,w ∈ BUC(Ω) be respectively a subsolutionand a supersolution of

v + H(x ,∇v) = 0 in Ω

If u ≤ w on ∂Ω, then u ≤ w in Ω.


Outline

1 Introduction




Dynamic Programming for 2-Players

What is the value function for the 2-players game?

WARNING:It is not

infa(·)∈A

supb(·)∈B

J(x , a, b)

because this would give to Player-a a big advantage choosing his controlsince he would know completely the future response of Player-b to anycontrol function a( · ) ∈ A.


Nonanticipating Strategies

A more reasonable information pattern can be modeled by means of thenotion of nonanticipating strategies introduced by Varayia, Roxin andElliott-Kalton

1-st Player

∆ ≡ α : B → A| b(t) = b(t) ∀t ≤ t ′ implies

α[b](t) = α[b](t) ∀t ≤ t ′ ,

2-nd Player

Γ ≡ β : A → B|a(t) = a(t) ∀t ≤ t ′ implies

β[a](t) = β[a](t) ∀t ≤ t ′ .


Lower Value of a game

Now we can define the lower value of the game

T (x) ≡ infα∈∆

supb∈B

tx(α[b], b),

or, after the change of variable,

V (x) ≡ infα∈∆

supb∈B

J(x , α[b], b)

where the payoff is

J(x , a, b) =

∫ tx (a,b)

0e−t dt


Value of a game

Similarly the upper value of the game is

T (x) := supβ∈Γ

infa∈A

tx(a, β[a]),

orV (x) := sup

β∈Γinfa∈A

J(x , a, β[a]) .

DEFINITIONWe say that the game has a value if the upper and lower values coincide,i.e. if T = T or V = V .


Dynamic Programming Principle for 2 Players

LEMMAFor all 0 ≤ t < T (x)

T (x) = infα∈∆

supb∈B t + T (yx(t;α[b], b)) , ∀x ∈ R \ T ,

and

V (x) = infα∈∆

supb∈B

∫ t

0e−s ds+e−tV (yx(t;α[b], b))

, ∀x ∈ T c ≡ RN\T .

The proof is similar to the 1-player case but more tecnical due to theessential use of non-anticipating strategies.


Isaacs equation

Isaacs’ Lower Hamiltonian

H(x , p) := minb∈B

maxa∈A−p · f (x , a, b) − 1 .

The upper values T and V satisfy a similar DP Principle.

Isaacs’ Upper Hamiltonian

H(x , p) := maxa∈A

minb∈B−p · f (x , a, b) − 1 .


Isaacs equation

THEOREM (Evans-Souganidis, 1984)1. If R \ T is open and T ( · ) is continuous, then T ( · ) is a viscositysolution of

H(x ,∇T ) = 0 in R \ T . (HJI)

2. If V ( · ) is continuous, then it is a viscosity solution of

V + H(x ,∇V ) = 0 in RN \ T .


Basic References

DETERMINISTIC CONTROL PROBLEMS AND GAMESM. Bardi, I. Capuzzo Dolcetta, Optimal control and viscosity solutions ofHamilton-Jacobi-Bellman equations, Birkhauser, 1997

A. I. Subbotin, Generalized solutions of first-order PDEs, Birkhauser,Boston, 1995

OTHER NUMERICAL METHODSViability kernel method via Viability Theory (Saint-Pierre, Quincampoix,Cardaliaguet...)Stable bridges methods (Patsko, Kumkov, ...)

STOCHASTIC CONTROL PROBLEMSW.H. Fleming, H.M. Soner, Control of Markov chains and viscositysolutions, Springer-Verlag, 1998.


Documents

An Introduction to Numerical Methods for Differential Games · An Introduction to Numerical Methods for Di erential Games M. Falcone ... applications of this theory to industrial