Space-Indexed Dynamic Programming: Learning to Follow Trajectories

Space-Indexed Dynamic Programming: Learning to

Follow Trajectories

J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu, Charles DuHadway

Computer Science DepartmentStanford University

July 2008, ICML

Outline

• Reinforcement Learning and Following Trajectories

• Space-indexed Dynamical Systems and Space-indexed Dynamic Programming

• Experimental Results

Reinforcement Learning and Following Trajectories

Trajectory Following

• Consider task of following trajectory in a vehicle such as a car or helicopter

• State space too large to discretize, can’t apply tabular RL/dynamic programming

Trajectory Following

• Dynamic programming algorithms w/ non-stationary policies seem well-suited to task– Policy Search by Dynamic Programming

(Bagnell, et. al), Differential Dynamic Programming (Jacobson and Mayne)

Dynamic Programming

Divide control task into discrete time steps

Dynamic Programming

t=2t=3

t=4 t=5 : : :

Dynamic Programming

t=1 t=2t=3

t=4 t=5 : : :

Proceeding backwards in time, learn policies for

t = T, T-1, …, 2, 1

Dynamic Programming

t=1 t=2t=3

t=4 t=5 : : :

t = T, T-1, …, 2, 1

Dynamic Programming

t=1 t=2t=3

t=4 t=5 : : :

t = T, T-1, …, 2, 1

¼5¼4

Dynamic Programming

t=1 t=2t=3

t=4 t=5 : : :

t = T, T-1, …, 2, 1

¼5¼4¼3

¼2¼1

Dynamic Programming

t=1 t=2t=3

t=4 t=5 : : :

Key Advantage: Policies are local (only need to perform well over small

portion of state space)

¼5¼4¼3

¼2¼1

Problems with Dynamic Programming

Problem #1: Policies from traditional dynamic

programming algorithms are time-indexed

Supposed we learned policy assuming this

distribution over states¼5

But, due to natural stochasticity of environment, car is actually here at t = 5

Resulting policy will perform very poorly

¼5¼4

¼3¼2

Partial Solution: Re-indexingExecute policy closest to current

location, regardless of time

Problem #2: Uncertainty over future states makes it hard to

learn any good policy

Due to stochasticity, large uncertainty over states in

distant future

Dist. over states at time t = 5

DP algorithms require learning policy that performs well over entire distribution

Dist. over states at time t = 5

Space-Indexed Dynamic Programming

• Basic idea of Space-Indexed Dynamic Programming (SIDP):

Perform DP with respect to space indices (planes tangent to trajectory)

Space-Indexed Dynamical Systems and Dynamic

Programming

Difficulty with SIDP

• No guarantee that taking single action will move to next plane along trajectory

• Introduce notion of space-indexed dynamical system

Time-Indexed Dynamical System

• Creating time-indexed dynamical systems:

_s = f (s;u)

current state

_s = f (s;u)

control actioncurrent state

_s = f (s;u)

control actioncurrent statetime derivative of state

_s = f (s;u)

Euler integration

st+¢ t = st +f (st;ut)¢ t

Space-Indexed Dynamical Systems

• Creating space-indexed dynamical systems:

• Simulate forward until whenever vehicle hits next tangent plane

space index d

space index d+1

space index dspace index d+1

_s = f (s;u)

sd+1 = sd+f (sd;ud)¢ t(sd;ud)

space index dspace index d+1

_s = f (s;u)

(Positive solution exists as long as controller makes

some forward progress)

¢ t(s;u) =( _s?d+1)

T (s¡ s?d+1)( _s?d+1)

¢ t(s;u)

• Result is a dynamical system indexed by spatial-index variable d rather than time

• Space-indexed dynamic programming runs DP directly on this system

Divide trajectory into discrete space planes

d=1 d=2

d=1 d=2d=3

d=4d=5

d=1 d=2d=3

d=4d=5

Proceeding backwards, learn policies for

d = D, D-1, …, 2, 1

d=1 d=2d=3

d=4d=5

d = D, D-1, …, 2, 1

d=1 d=2d=3

d=4d=5

¼5¼4

d = D, D-1, …, 2, 1

d=1 d=2d=3

d=4d=5

¼5¼4¼3

¼2¼1

d = D, D-1, …, 2, 1

Problem #1: Policies from traditional dynamic

programming algorithms are time-indexed

Time indexed DP: can execute

policy learned for different location

Space indexed DP: always executes policy based on current spatial

Problem #2: Uncertainty over future states makes it hard to

learn any good policy

Time indexed DP: wide distribution

over future states

Space indexed DP: much tighter

distribution over future states

Dist. over states at time t = 5 Dist. over states at index d = 5

Time indexed DP: wide distribution

over future states

Space indexed DP: much tighter

distribution over future states

Dist. over states at time t = 5 Dist. over states at index d = 5

Experiments

Experimental Domain

• Task: following race track trajectory in RC car with randomly placed obstacles

Experimental Setup

• Implemented space-indexed version of PSDP algorithm– Policy chooses steering angle using SVM

classifier (constant velocity)– Used simple textbook model simulator of car

dynamics to learn policy

• Evaluated PSDP time-indexed, time-indexed with re-indexing and space-indexed

Time-Indexed PSDP

Time-Indexed PSDP w/ Re-indexing

Space-Indexed PSDP

Empirical Evaluation

Time-indexed PSDP Time-indexed PSDP with Re-indexing

Space-indexed PSDP

Cost: 49.32Cost: Infinite (no trajectory succeeds) Cost: 59.74

Additional Experiments

• In the paper: additional experiments on the Stanford Grand Challenge Car using space-indexed DDP, and on a simulated helicopter domain using space-indexed PSDP

Related Work

• Reinforcement learning / dynamic programming: Bagnell et al., 2004; Jacobson and Mayne, 1970; Lagoudakis and Parr, 2003; Langford and Zadrozny, 2005

• Differential Dynamic Programming: Atkeson, 1994; Tassa et al., 2008

• Gain Scheduling, Model Predictive Control: Leith and Leithead, 2000; Garica et al., 1989

Summary

• Trajectory following uses non-stationary policies, but traditional DP / RL algorithms suffer because they are time-indexed

• In this paper, we introduce the notions of a space-indexed dynamical system, and space-indexed dynamic programming

• Demonstrated usefulness of these methods on real-world control tasks.

Thank you!

Videos available online athttp://cs.stanford.edu/~kolter/icml08videos

Space-Indexed Dynamic Programming: Learning to Follow Trajectories

Documents

Internationally indexed journalInternationally indexed journal

Semantic Trajectories

Indexed Hive

Trajectory Planning for Robot Manipulators - CORE · Trajectory Planning Scaling trajectories Analysis of Trajectories Trajectories in the Workspace Introduction Joint-space trajectories

VEHICLE TRAJECTORIES RESULTING FROM TRAVERSING FDOT …roadsafellc.com/NCHRP22-24/Literature/Papers/Vehicle Trajectories... · VEHICLE TRAJECTORIES RESULTING FROM TRAVERSING FDOT

3-3 Projectile Motion Two Dimensional Motion of Objects Projectile Motion – If air resistance is disregarded, projectiles follow parabolic trajectories

Internationally indexed journalInternationally …Internationally indexed journalInternationally indexed journal Indexed in Chemical Abstract Services (USA), Index coppernicus, Ulrichs

Calculus Indexed

Space-Indexed Dynamic Programming: Learning to Follow Trajectories J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu, Charles DuHadway Computer Science

INDEXED FILES ISAM (Indexed Sequential Access Method) · 2017-11-07 · Indexdateien 1 Grundlagen der Datenbanksysteme II INDEXDATEIEN ( INDEXED FILES ) ISAM (Indexed Sequential Access

INDEXED BY

Lunar Trajectories

INDEXED WITH

Exponential trajectories, cell size fluctuations and the ... · Exponential trajectories, cell size uctuations and the adder property in bacteria follow from simple chemical dynamics

Urban Trajectories

Indexed primefaces users_guide_3_5

Foreign Fighters: Trends, Trajectories & Conflict Zonescchs.auburn.edu/_files/foreign-fighters-trends-trajectories-and... · This report FOREIGN FIGHTERS: Trends, Trajectories & Conflict

Indexed Annuity

Feminist Trajectories

Trajectories 05.11.16