60
CS287 Fall 2019 – Lecture 2 Markov Decision Processes and Exact Solution Methods Pieter Abbeel UC Berkeley EECS

CS287 Fall 2019 –Lecture2 Markov Decision Processes and ...pabbeel/cs287-fa19/slides/Lec2-m… · CS287 Fall 2019 –Lecture2 Markov Decision Processes and Exact Solution Methods

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CS287 Fall 2019 –Lecture2 Markov Decision Processes and ...pabbeel/cs287-fa19/slides/Lec2-m… · CS287 Fall 2019 –Lecture2 Markov Decision Processes and Exact Solution Methods

CS287 Fall 2019 – Lecture 2

Markov Decision Processesand

Exact Solution Methods

Pieter AbbeelUC Berkeley EECS

Page 2: CS287 Fall 2019 –Lecture2 Markov Decision Processes and ...pabbeel/cs287-fa19/slides/Lec2-m… · CS287 Fall 2019 –Lecture2 Markov Decision Processes and Exact Solution Methods

n Markov Decision Processes (MDPs)

n Exact Solution Methods

n Value Iteration

n Policy Iteration

n Linear Programming

n Maximum Entropy Formulation

n Entropy

n Max-ent Formulation

n Intermezzo on Constrained Optimization

n Max-Ent Value Iteration

Outline for Today’s Lecture

Page 3: CS287 Fall 2019 –Lecture2 Markov Decision Processes and ...pabbeel/cs287-fa19/slides/Lec2-m… · CS287 Fall 2019 –Lecture2 Markov Decision Processes and Exact Solution Methods

[Drawing from Sutton and Barto, Reinforcement Learning: An Introduction, 1998]

Markov Decision Process

Assumption: agent gets to observe the state

Page 4: CS287 Fall 2019 –Lecture2 Markov Decision Processes and ...pabbeel/cs287-fa19/slides/Lec2-m… · CS287 Fall 2019 –Lecture2 Markov Decision Processes and Exact Solution Methods

Markov Decision Process (S, A, T, R, γ, H)Given:

n S: set of states

n A: set of actions

n T: S x A x S x {0,1,…,H} à [0,1] Tt(s,a,s’) = P(st+1 = s’ | st = s, at =a)

n R: S x A x S x {0, 1, …, H} à Rt(s,a,s’) = reward for (st+1 = s’, st = s, at =a)

n γ in (0,1]: discount factor H: horizon over which the agent will act

Goal:

n Find π*: S x {0, 1, …, H} à A that maximizes expected sum of rewards, i.e.,

R

Page 5: CS287 Fall 2019 –Lecture2 Markov Decision Processes and ...pabbeel/cs287-fa19/slides/Lec2-m… · CS287 Fall 2019 –Lecture2 Markov Decision Processes and Exact Solution Methods

MDP (S, A, T, R, γ, H), goal:

q Cleaning robot

q Walking robot

q Pole balancing

q Games: tetris, backgammon

Examples

q Server management

q Shortest path problems

q Model for animals, people

Page 6: CS287 Fall 2019 –Lecture2 Markov Decision Processes and ...pabbeel/cs287-fa19/slides/Lec2-m… · CS287 Fall 2019 –Lecture2 Markov Decision Processes and Exact Solution Methods

Canonical Example: Grid World§ The agent lives in a grid§ Walls block the agent’s path§ The agent’s actions do not

always go as planned:§ 80% of the time, the action North

takes the agent North (if there is no wall there)

§ 10% of the time, North takes the agent West; 10% East

§ If there is a wall in the direction the agent would have been taken, the agent stays put

§ Big rewards come at the end

Page 7: CS287 Fall 2019 –Lecture2 Markov Decision Processes and ...pabbeel/cs287-fa19/slides/Lec2-m… · CS287 Fall 2019 –Lecture2 Markov Decision Processes and Exact Solution Methods

Solving MDPsn In an MDP, we want to find an optimal policy p*: S x 0:H → A

n A policy p gives an action for each state for each time

n An optimal policy maximizes expected sum of rewards

n Contrast: If environment were deterministic, then would just need an optimal plan, or sequence of actions, from start to a goal

t=0

t=1t=2

t=3t=4

t=5=H

Page 8: CS287 Fall 2019 –Lecture2 Markov Decision Processes and ...pabbeel/cs287-fa19/slides/Lec2-m… · CS287 Fall 2019 –Lecture2 Markov Decision Processes and Exact Solution Methods

n Markov Decision Processes (MDPs)

n Exact Solution Methods

n Value Iteration

n Policy Iteration

n Linear Programming

n Maximum Entropy Formulation

n Entropy

n Max-ent Formulation

n Intermezzo on Constrained Optimization

n Max-Ent Value Iteration

Outline for Today’s Lecture

For now: discrete state-action spaces as they are simpler

to get the main concepts across.

We will consider continuous spaces

next lecture!

Page 9: CS287 Fall 2019 –Lecture2 Markov Decision Processes and ...pabbeel/cs287-fa19/slides/Lec2-m… · CS287 Fall 2019 –Lecture2 Markov Decision Processes and Exact Solution Methods
Page 10: CS287 Fall 2019 –Lecture2 Markov Decision Processes and ...pabbeel/cs287-fa19/slides/Lec2-m… · CS287 Fall 2019 –Lecture2 Markov Decision Processes and Exact Solution Methods

Value IterationAlgorithm:

Start with for all s.

For i = 1, … , H

For all states s in S:

This is called a value update or Bellman update/back-up

= expected sum of rewards accumulated starting from state s, acting optimally for i steps

= optimal action when in state s and getting to act for i steps

Page 11: CS287 Fall 2019 –Lecture2 Markov Decision Processes and ...pabbeel/cs287-fa19/slides/Lec2-m… · CS287 Fall 2019 –Lecture2 Markov Decision Processes and Exact Solution Methods

Value Iteration in Gridworldnoise = 0.2, γ =0.9, two terminal states with R = +1 and -1

Page 12: CS287 Fall 2019 –Lecture2 Markov Decision Processes and ...pabbeel/cs287-fa19/slides/Lec2-m… · CS287 Fall 2019 –Lecture2 Markov Decision Processes and Exact Solution Methods

Value Iteration in Gridworldnoise = 0.2, γ =0.9, two terminal states with R = +1 and -1

Page 13: CS287 Fall 2019 –Lecture2 Markov Decision Processes and ...pabbeel/cs287-fa19/slides/Lec2-m… · CS287 Fall 2019 –Lecture2 Markov Decision Processes and Exact Solution Methods

Value Iteration in Gridworldnoise = 0.2, γ =0.9, two terminal states with R = +1 and -1

Page 14: CS287 Fall 2019 –Lecture2 Markov Decision Processes and ...pabbeel/cs287-fa19/slides/Lec2-m… · CS287 Fall 2019 –Lecture2 Markov Decision Processes and Exact Solution Methods

Value Iteration in Gridworldnoise = 0.2, γ =0.9, two terminal states with R = +1 and -1

Page 15: CS287 Fall 2019 –Lecture2 Markov Decision Processes and ...pabbeel/cs287-fa19/slides/Lec2-m… · CS287 Fall 2019 –Lecture2 Markov Decision Processes and Exact Solution Methods

Value Iteration in Gridworldnoise = 0.2, γ =0.9, two terminal states with R = +1 and -1

Page 16: CS287 Fall 2019 –Lecture2 Markov Decision Processes and ...pabbeel/cs287-fa19/slides/Lec2-m… · CS287 Fall 2019 –Lecture2 Markov Decision Processes and Exact Solution Methods

Value Iteration in Gridworldnoise = 0.2, γ =0.9, two terminal states with R = +1 and -1

Page 17: CS287 Fall 2019 –Lecture2 Markov Decision Processes and ...pabbeel/cs287-fa19/slides/Lec2-m… · CS287 Fall 2019 –Lecture2 Markov Decision Processes and Exact Solution Methods

Value Iteration in Gridworldnoise = 0.2, γ =0.9, two terminal states with R = +1 and -1

Page 18: CS287 Fall 2019 –Lecture2 Markov Decision Processes and ...pabbeel/cs287-fa19/slides/Lec2-m… · CS287 Fall 2019 –Lecture2 Markov Decision Processes and Exact Solution Methods

§ Now we know how to act for infinite horizon with discounted rewards!§ Run value iteration till convergence.§ This produces V*, which in turn tells us how to act, namely following:

§ Note: the infinite horizon optimal policy is stationary, i.e., the optimal action at a state s is the same action at all times. (Efficient to store!)

Value Iteration Convergence

Theorem. Value iteration converges. At convergence, we have found the optimal value function V* for the discounted infinite horizon problem, which satisfies the Bellman equations

Page 19: CS287 Fall 2019 –Lecture2 Markov Decision Processes and ...pabbeel/cs287-fa19/slides/Lec2-m… · CS287 Fall 2019 –Lecture2 Markov Decision Processes and Exact Solution Methods

n = expected sum of rewards accumulated starting from state s, acting optimally for steps

n = expected sum of rewards accumulated starting from state s, acting optimally for H steps

n Additional reward collected over time steps H+1, H+2, …

goes to zero as H goes to infinity

Hence

For simplicity of notation in the above it was assumed that rewards are always greater than or equal to zero. If rewards can be negative, a similar argument holds, using max |R| and bounding from both sides.

Convergence: Intuition

V ⇤H(s)

1V ⇤(s)

�H+1R(sH+1) + �H+2R(sH+2) + . . . �H+1Rmax + �H+2Rmax + . . . =�H+1

1� �Rmax

V ⇤H

H!1����! V ⇤

Page 20: CS287 Fall 2019 –Lecture2 Markov Decision Processes and ...pabbeel/cs287-fa19/slides/Lec2-m… · CS287 Fall 2019 –Lecture2 Markov Decision Processes and Exact Solution Methods

Convergence and Contractionsn Definition: max-norm:

n Definition: An update operation is a γ-contraction in max-norm if and only if

for all Ui, Vi:

n Theorem: A contraction converges to a unique fixed point, no matter initialization.

n Fact: the value iteration update is a γ-contraction in max-norm

n Corollary: value iteration converges to a unique fixed point

n Additional fact:n I.e. once the update is small, it must also be close to converged

Page 21: CS287 Fall 2019 –Lecture2 Markov Decision Processes and ...pabbeel/cs287-fa19/slides/Lec2-m… · CS287 Fall 2019 –Lecture2 Markov Decision Processes and Exact Solution Methods

(a) Prefer the close exit (+1), risking the cliff (-10)

(b) Prefer the close exit (+1), but avoiding the cliff (-10)

(c) Prefer the distant exit (+10), risking the cliff (-10)

(d) Prefer the distant exit (+10), avoiding the cliff (-10)

Exercise 1: Effect of Discount and Noise

(1) γ = 0.1, noise = 0.5

(2) γ = 0.99, noise = 0

(3) γ = 0.99, noise = 0.5

(4) γ = 0.1, noise = 0

Page 22: CS287 Fall 2019 –Lecture2 Markov Decision Processes and ...pabbeel/cs287-fa19/slides/Lec2-m… · CS287 Fall 2019 –Lecture2 Markov Decision Processes and Exact Solution Methods

(a) Prefer close exit (+1), risking the cliff (-10) --- (4) γ = 0.1, noise = 0

Exercise 1 Solution

Page 23: CS287 Fall 2019 –Lecture2 Markov Decision Processes and ...pabbeel/cs287-fa19/slides/Lec2-m… · CS287 Fall 2019 –Lecture2 Markov Decision Processes and Exact Solution Methods

(b) Prefer close exit (+1), avoiding the cliff (-10) --- (1) γ = 0.1, noise = 0.5

Exercise 1 Solution

Page 24: CS287 Fall 2019 –Lecture2 Markov Decision Processes and ...pabbeel/cs287-fa19/slides/Lec2-m… · CS287 Fall 2019 –Lecture2 Markov Decision Processes and Exact Solution Methods

(c) Prefer distant exit (+1), risking the cliff (-10) --- (2) γ = 0.99, noise = 0

Exercise 1 Solution

Page 25: CS287 Fall 2019 –Lecture2 Markov Decision Processes and ...pabbeel/cs287-fa19/slides/Lec2-m… · CS287 Fall 2019 –Lecture2 Markov Decision Processes and Exact Solution Methods

(d) Prefer distant exit (+1), avoid the cliff (-10) --- (3) γ = 0.99, noise = 0.5

Exercise 1 Solution

Page 26: CS287 Fall 2019 –Lecture2 Markov Decision Processes and ...pabbeel/cs287-fa19/slides/Lec2-m… · CS287 Fall 2019 –Lecture2 Markov Decision Processes and Exact Solution Methods

n Markov Decision Processes (MDPs)

n Exact Solution Methods

n Value Iteration

n Policy Iteration

n Linear Programming

n Maximum Entropy Formulation

n Entropy

n Max-ent Formulation

n Intermezzo on Constrained Optimization

n Max-Ent Value Iteration

Outline for Today’s Lecture

For now: discrete state-action spaces as they are simpler

to get the main concepts across.

We will consider continuous spaces

next lecture!

Page 27: CS287 Fall 2019 –Lecture2 Markov Decision Processes and ...pabbeel/cs287-fa19/slides/Lec2-m… · CS287 Fall 2019 –Lecture2 Markov Decision Processes and Exact Solution Methods
Page 28: CS287 Fall 2019 –Lecture2 Markov Decision Processes and ...pabbeel/cs287-fa19/slides/Lec2-m… · CS287 Fall 2019 –Lecture2 Markov Decision Processes and Exact Solution Methods

Policy Evaluationn Recall value iteration iterates:

n Policy evaluation:

At convergence:

Page 29: CS287 Fall 2019 –Lecture2 Markov Decision Processes and ...pabbeel/cs287-fa19/slides/Lec2-m… · CS287 Fall 2019 –Lecture2 Markov Decision Processes and Exact Solution Methods

Exercise 2

Page 30: CS287 Fall 2019 –Lecture2 Markov Decision Processes and ...pabbeel/cs287-fa19/slides/Lec2-m… · CS287 Fall 2019 –Lecture2 Markov Decision Processes and Exact Solution Methods

Policy Iteration

n Repeat until policy converges

n At convergence: optimal policy; and converges faster under some conditions

One iteration of policy iteration:

Page 31: CS287 Fall 2019 –Lecture2 Markov Decision Processes and ...pabbeel/cs287-fa19/slides/Lec2-m… · CS287 Fall 2019 –Lecture2 Markov Decision Processes and Exact Solution Methods

Policy Evaluation Revisitedn Idea 1: modify Bellman updates

n Idea 2: it is just a linear system, solve with Matlab (or whatever)

variables: Vπ(s)

constants: T, R

Page 32: CS287 Fall 2019 –Lecture2 Markov Decision Processes and ...pabbeel/cs287-fa19/slides/Lec2-m… · CS287 Fall 2019 –Lecture2 Markov Decision Processes and Exact Solution Methods

Proof sketch: (1) Guarantee to converge: In every step the policy improves. This means that a given policy can be

encountered at most once. This means that after we have iterated as many times as there are different policies, i.e., (number actions)(number states), we must be done and hence have converged.

(2) Optimal at convergence: by definition of convergence, at convergence πk+1(s) = πk(s) for all states s. This means

Hence satisfies the Bellman equation, which means is equal to the optimal value function V*.

Policy Iteration Guarantees

Theorem. Policy iteration is guaranteed to converge and at convergence, the current policy and its value function are the optimal policy and the optimal value function!

Policy Iteration iterates over:

Page 33: CS287 Fall 2019 –Lecture2 Markov Decision Processes and ...pabbeel/cs287-fa19/slides/Lec2-m… · CS287 Fall 2019 –Lecture2 Markov Decision Processes and Exact Solution Methods

n Markov Decision Processes (MDPs)

n Exact Solution Methods

n Value Iteration

n Policy Iteration

n Linear Programming

n Maximum Entropy Formulation

n Entropy

n Max-ent Formulation

n Intermezzo on Constrained Optimization

n Max-ent Value Iteration

Outline for Today’s Lecture

For now: discrete state-action spaces as they are simpler

to get the main concepts across.

We will consider continuous spaces

next lecture!

Page 34: CS287 Fall 2019 –Lecture2 Markov Decision Processes and ...pabbeel/cs287-fa19/slides/Lec2-m… · CS287 Fall 2019 –Lecture2 Markov Decision Processes and Exact Solution Methods

n What if optimal path becomes blocked? Optimal policy fails.

n Is there any way to solve for a distribution rather than single solution? à more robust

Obstacles Gridworld

Page 35: CS287 Fall 2019 –Lecture2 Markov Decision Processes and ...pabbeel/cs287-fa19/slides/Lec2-m… · CS287 Fall 2019 –Lecture2 Markov Decision Processes and Exact Solution Methods

What if we could find a “set of solutions”?

Page 36: CS287 Fall 2019 –Lecture2 Markov Decision Processes and ...pabbeel/cs287-fa19/slides/Lec2-m… · CS287 Fall 2019 –Lecture2 Markov Decision Processes and Exact Solution Methods

n Entropy = measure of uncertainty over random variable X

= number of bits required to encode X (on average)

Entropy

Page 37: CS287 Fall 2019 –Lecture2 Markov Decision Processes and ...pabbeel/cs287-fa19/slides/Lec2-m… · CS287 Fall 2019 –Lecture2 Markov Decision Processes and Exact Solution Methods

E.g. binary random variable

Entropy

Page 38: CS287 Fall 2019 –Lecture2 Markov Decision Processes and ...pabbeel/cs287-fa19/slides/Lec2-m… · CS287 Fall 2019 –Lecture2 Markov Decision Processes and Exact Solution Methods

Entropy

Page 39: CS287 Fall 2019 –Lecture2 Markov Decision Processes and ...pabbeel/cs287-fa19/slides/Lec2-m… · CS287 Fall 2019 –Lecture2 Markov Decision Processes and Exact Solution Methods

n Regular formulation:

n Max-ent formulation:

Maximum Entropy MDP

Page 40: CS287 Fall 2019 –Lecture2 Markov Decision Processes and ...pabbeel/cs287-fa19/slides/Lec2-m… · CS287 Fall 2019 –Lecture2 Markov Decision Processes and Exact Solution Methods

n But first need intermezzo on constrained optimization…

Max-ent Value Iteration

Page 41: CS287 Fall 2019 –Lecture2 Markov Decision Processes and ...pabbeel/cs287-fa19/slides/Lec2-m… · CS287 Fall 2019 –Lecture2 Markov Decision Processes and Exact Solution Methods
Page 42: CS287 Fall 2019 –Lecture2 Markov Decision Processes and ...pabbeel/cs287-fa19/slides/Lec2-m… · CS287 Fall 2019 –Lecture2 Markov Decision Processes and Exact Solution Methods

n Original problem:

n Lagrangian:

n At optimum:

Constrained Optimization

Page 43: CS287 Fall 2019 –Lecture2 Markov Decision Processes and ...pabbeel/cs287-fa19/slides/Lec2-m… · CS287 Fall 2019 –Lecture2 Markov Decision Processes and Exact Solution Methods
Page 44: CS287 Fall 2019 –Lecture2 Markov Decision Processes and ...pabbeel/cs287-fa19/slides/Lec2-m… · CS287 Fall 2019 –Lecture2 Markov Decision Processes and Exact Solution Methods
Page 45: CS287 Fall 2019 –Lecture2 Markov Decision Processes and ...pabbeel/cs287-fa19/slides/Lec2-m… · CS287 Fall 2019 –Lecture2 Markov Decision Processes and Exact Solution Methods

Max-ent for 1-step problem

Page 46: CS287 Fall 2019 –Lecture2 Markov Decision Processes and ...pabbeel/cs287-fa19/slides/Lec2-m… · CS287 Fall 2019 –Lecture2 Markov Decision Processes and Exact Solution Methods

Max-ent for 1-step problem

= softmax

Page 47: CS287 Fall 2019 –Lecture2 Markov Decision Processes and ...pabbeel/cs287-fa19/slides/Lec2-m… · CS287 Fall 2019 –Lecture2 Markov Decision Processes and Exact Solution Methods
Page 48: CS287 Fall 2019 –Lecture2 Markov Decision Processes and ...pabbeel/cs287-fa19/slides/Lec2-m… · CS287 Fall 2019 –Lecture2 Markov Decision Processes and Exact Solution Methods

Max-ent Value Iteration

= 1-step problem (with Q instead of r), so we can directly transcribe solution:

Page 49: CS287 Fall 2019 –Lecture2 Markov Decision Processes and ...pabbeel/cs287-fa19/slides/Lec2-m… · CS287 Fall 2019 –Lecture2 Markov Decision Processes and Exact Solution Methods

Maxent in Our Obstacles Gridworld (T=1)

Page 50: CS287 Fall 2019 –Lecture2 Markov Decision Processes and ...pabbeel/cs287-fa19/slides/Lec2-m… · CS287 Fall 2019 –Lecture2 Markov Decision Processes and Exact Solution Methods

Maxent in Our Obstacles Gridworld (T=1e-2)

Page 51: CS287 Fall 2019 –Lecture2 Markov Decision Processes and ...pabbeel/cs287-fa19/slides/Lec2-m… · CS287 Fall 2019 –Lecture2 Markov Decision Processes and Exact Solution Methods

Maxent in Our Obstacles Gridworld (T=0)

Page 52: CS287 Fall 2019 –Lecture2 Markov Decision Processes and ...pabbeel/cs287-fa19/slides/Lec2-m… · CS287 Fall 2019 –Lecture2 Markov Decision Processes and Exact Solution Methods

n Markov Decision Processes (MDPs)

n Exact Solution Methods

n Value Iteration

n Policy Iteration

n Linear Programming

n Maximum Entropy Formulation

n Entropy

n Max-ent Formulation

n Intermezzo on Constrained Optimization

n Max-ent Value Iteration

Outline for Today’s Lecture

For now: discrete state-action spaces as they are simpler

to get the main concepts across.

We will consider continuous spaces

next lecture!

Page 53: CS287 Fall 2019 –Lecture2 Markov Decision Processes and ...pabbeel/cs287-fa19/slides/Lec2-m… · CS287 Fall 2019 –Lecture2 Markov Decision Processes and Exact Solution Methods
Page 54: CS287 Fall 2019 –Lecture2 Markov Decision Processes and ...pabbeel/cs287-fa19/slides/Lec2-m… · CS287 Fall 2019 –Lecture2 Markov Decision Processes and Exact Solution Methods

n Recall, at value iteration convergence we have

n LP formulation to find V*:

μ0 is a probability distribution over S, with μ0(s)> 0 for all s in S.

Infinite Horizon Linear Program

Theorem. V* is the solution to the above LP.

Page 55: CS287 Fall 2019 –Lecture2 Markov Decision Processes and ...pabbeel/cs287-fa19/slides/Lec2-m… · CS287 Fall 2019 –Lecture2 Markov Decision Processes and Exact Solution Methods

Theorem Proof

Page 56: CS287 Fall 2019 –Lecture2 Markov Decision Processes and ...pabbeel/cs287-fa19/slides/Lec2-m… · CS287 Fall 2019 –Lecture2 Markov Decision Processes and Exact Solution Methods

n How about:

Exercise 3

Page 57: CS287 Fall 2019 –Lecture2 Markov Decision Processes and ...pabbeel/cs287-fa19/slides/Lec2-m… · CS287 Fall 2019 –Lecture2 Markov Decision Processes and Exact Solution Methods
Page 58: CS287 Fall 2019 –Lecture2 Markov Decision Processes and ...pabbeel/cs287-fa19/slides/Lec2-m… · CS287 Fall 2019 –Lecture2 Markov Decision Processes and Exact Solution Methods

n Interpretation:

n

n Equation 2: ensures that λ has the above meaning

n Equation 1: maximize expected discounted sum of rewards

n Optimal policy:

Dual Linear Program

Page 59: CS287 Fall 2019 –Lecture2 Markov Decision Processes and ...pabbeel/cs287-fa19/slides/Lec2-m… · CS287 Fall 2019 –Lecture2 Markov Decision Processes and Exact Solution Methods

n Markov Decision Processes (MDPs)

n Exact Solution Methods

n Value Iteration

n Policy Iteration

n Linear Programming

n Maximum Entropy Formulation

n Entropy

n Max-ent Formulation

n Intermezzo on Constrained Optimization

n Max-ent Value Iteration

Outline for Today’s Lecture

For now: discrete state-action spaces as they are simpler

to get the main concepts across.

We will consider continuous spaces

next lecture!

Page 60: CS287 Fall 2019 –Lecture2 Markov Decision Processes and ...pabbeel/cs287-fa19/slides/Lec2-m… · CS287 Fall 2019 –Lecture2 Markov Decision Processes and Exact Solution Methods

n Optimal control: provides general computational approach to tackle control problems.

n Dynamic programming / Value iterationn Discrete state spaces – Exact methodsn Continuous state spaces – Approximate solutions through discretizationn Large state spaces – Approximate solutions through function approximationn Linear systems – Closed form exact solution with LQRn Nonlinear systems – How to extend the exact solutions for linear systems:

n Local linearizationn iLQR, Differential dynamic programming

n Optimal Control through Nonlinear Optimizationn Shooting <> Collocation formulationsn Model Predictive Control (MPC)

n Examples:

Today and Forthcoming Lectures