44
Predictive State Representation Masoumeh Izadi School of Computer Science McGill University UdeM-McGill Machine Learning Seminar

Predictive State Representation Masoumeh Izadi School of Computer Science McGill University

  • Upload
    sasson

  • View
    38

  • Download
    0

Embed Size (px)

DESCRIPTION

Predictive State Representation Masoumeh Izadi School of Computer Science McGill University. UdeM-McGill Machine Learning Seminar. Outline. Predictive Representations PSR model specifications Learning PSR Using PSR in Control Problem Conclusion Future Directions. Motivation. - PowerPoint PPT Presentation

Citation preview

Page 1: Predictive State  Representation Masoumeh Izadi School of Computer Science McGill University

Predictive State Representation

Masoumeh IzadiSchool of Computer Science

McGill University

UdeM-McGill Machine Learning Seminar

Page 2: Predictive State  Representation Masoumeh Izadi School of Computer Science McGill University

Outline

Predictive Representations PSR model specifications Learning PSR Using PSR in Control Problem Conclusion Future Directions

Page 3: Predictive State  Representation Masoumeh Izadi School of Computer Science McGill University

Motivation

In a dynamical system:

Knowing the exact state of the system is mostly an unrealistic assumption.

Real world tasks exhibit uncertainty POMDPs maintain belief b=(p(s0)….p(sn)) over

hidden variables si as the state. Beliefs are not verifiable! POMDPs are hard to learn and to solve.

Page 4: Predictive State  Representation Masoumeh Izadi School of Computer Science McGill University

Motivation

Potential alternatives: K-Markov Model not general!

Predictive Representations

Page 5: Predictive State  Representation Masoumeh Izadi School of Computer Science McGill University

Predictive Representations

State representation is in terms of experience. Status (state) is represented by predictions

made from it. Predictions represent cause and effect. Predictions are testable, maintainable, and

learnable.

No explicit notion of topological relationships.

Page 6: Predictive State  Representation Masoumeh Izadi School of Computer Science McGill University

Predictive State Representation

Test: a sequence of action-observation pairs

Prediction for a test given a history: Sufficient statistics: predictions for a set of

core tests, Q

q = a1o1...akok

p(q|h)=P(o1...ok |h, a1...ak)

Page 7: Predictive State  Representation Masoumeh Izadi School of Computer Science McGill University

Core Tests

A set of tests Q is a core tests set if its prediction forms a sufficient statistic for the dynamical system.

p(Q|h)=[p(q1|h) ...p(qn |h)]

For any test t: p(t|h)=f_t (p(Q|h))

Page 8: Predictive State  Representation Masoumeh Izadi School of Computer Science McGill University

Linear PSR Model

For any test q, there exists a projection vector mq s.t:

p(q|h) = p(Q|h)T mq

Given a new action-observation pair, ao, the prediction vector for each qi є Q is updated by:

P(qi |hao) = p(aoqi|h) / p(ao|h) = p(Q|h)T maoqi/ p(Q|h)Tmao

Page 9: Predictive State  Representation Masoumeh Izadi School of Computer Science McGill University

PSR Model Parameters

The set of core tests: Q={q1….qn}

Projection vectors for one step tests :mao (for all ao pairs )

Projection vectors for one step extension of core tests maoqi (for all ao pairs )

Page 10: Predictive State  Representation Masoumeh Izadi School of Computer Science McGill University

Linear PSR vs. POMDP

A linear PSR representation can be more compact than the POMDP representation.

A POMDP with n nominal states can represent a dynamical system of

dimensions ≤ n

Page 11: Predictive State  Representation Masoumeh Izadi School of Computer Science McGill University

POMDP Model

The model is an n-tuple { S, A, , T, O, R }:

Sufficient statistics: belief state (probability distribution over S)

S = set of states A = set of actions = set of observationsT = transition probability distribution for each actionO= observation probability distribution for each action-observationR = reward functions for each action

Page 12: Predictive State  Representation Masoumeh Izadi School of Computer Science McGill University

Belief State

Posterior probability distribution over states

b b’a

o1

|S|=3

1

1

b’(s’) = O(s’,a,o)T(s,a,s’) b(s)/Pr(o | a,b)

0 b(s) 1 for all sS and sS b(s) = 1

Page 13: Predictive State  Representation Masoumeh Izadi School of Computer Science McGill University

Construct PSR from POMDP

Outcome function u (t):the predictions for test t from all POMDP states.

Definition: A test t is said to be independent of a set of tests T if its outcome vector is linearly independent of the predictions for tests in T.

TToaa

Tn

tuOTaotu

eu

))(()(

)1,...1,1()(,

Page 14: Predictive State  Representation Masoumeh Izadi School of Computer Science McGill University

State Prediction Matrix

Rank of the matrix determines the size of Q.

Core tests corresponds to linearly independent columns.

Entries are computed using the POMDP model.

u(tj)

t1 t2 all possible teststj

s2

s1

si

sn

Page 15: Predictive State  Representation Masoumeh Izadi School of Computer Science McGill University

Linearly Independent States

Definition: A linearly dependent state of an MDP is a state for which any action transition function is a linear combination of the transition functions from other states.

Having the same dynamical structure is a special case of linear dependency.

Page 16: Predictive State  Representation Masoumeh Izadi School of Computer Science McGill University

Example

0.2

0.8

0.7

0.3O1, O2

O3, O2

O1, O4

O3

O4

O2

Linear PSR needs only two tests to represent the system

e.g.: ao1, ao4 can predict any other tests

Page 17: Predictive State  Representation Masoumeh Izadi School of Computer Science McGill University

State Space Compression

Theorem For any controlled dynamical system :

linearly dependent states in the underlying MDP

more compact PSR than the corresponding POMDP.Reverse direction is not always the case due to possible structure in the observations

Page 18: Predictive State  Representation Masoumeh Izadi School of Computer Science McGill University

Exploiting Structure

PSR exploits linear independence structure in the dynamics of a system.

PSR also exploits regularities in dynamics.

Lossless compression needs invariance of state representation in terms of values as well as dynamics.

Including reward as part of observation makes linear PSR similar to linear lossless compressions for POMDPs.

Page 19: Predictive State  Representation Masoumeh Izadi School of Computer Science McGill University

POMDP Example

States: 20 (directions , grid state)Actions: 3(turn left, turn right, move);Observations: 2 (wall, nothing);

Page 20: Predictive State  Representation Masoumeh Izadi School of Computer Science McGill University

Structure Captured by PSR

Alias states (by immediate observation)

Predictive classes (by PSR core tests)

Page 21: Predictive State  Representation Masoumeh Izadi School of Computer Science McGill University

Generalization

• Good generalization results when similar situations have similar representations.• A good generalization makes it possible to learn with small amount of experience.

• Predictive representation: generalizes the state space well. makes the problem simpler and yet precise. assists reinforcement learning algorithms. [Rafols et al 2005]

Page 22: Predictive State  Representation Masoumeh Izadi School of Computer Science McGill University

Learning the PSR Model

The set of core tests: Q={q1….q|Q|}

Projection vectors for one step tests :mao (for all ao pairs )

Projection vectors for one step extension of core tests maoqi (for all ao pairs )

Page 23: Predictive State  Representation Masoumeh Izadi School of Computer Science McGill University

System Dynamics Vector

Prediction of all possible future events can be generated having any precise model of the system.

t1 t2

p(t1) p(t2) p(ti)ti

ti=a1o1…akok

p(ti) = prob(o1…ok|a1…ak)

Page 24: Predictive State  Representation Masoumeh Izadi School of Computer Science McGill University

System Dynamics Matrix

Linear dimension of a dynamical system is determined by the rank of the system dynamics matrix.

P(tj|hi)

t1 t2 tjh1 =ε

hi

h2

tj=a1o1…akok

hi=a’1o’1…a’no’n

p(tj|hi) = prob ( on+1= o1,…, on+k= ok|a’1o’1…a’no’n , a1…ak)

Page 25: Predictive State  Representation Masoumeh Izadi School of Computer Science McGill University

POMDP in System Dynamics Matrix

Any model must be able to generate System Dynamic Matrix.Core beliefs B = {b1 b2 … qN} :

Span the reachable subspace of continuous belief space; Can be beneficial in POMDP solution methods [Izadi et al 2005]

Represent reduced state space dimensions in structured domains

P(tj|bi)

t1 t2 tjb1

bi

b2

Page 26: Predictive State  Representation Masoumeh Izadi School of Computer Science McGill University

Core Test Discovery

Zij= P(tj|hi)

Extend tests and histories one-step and estimate entries of Z (counting data samples).

Find the rank and keep the linearly independent tests and histories

Keep extending until the rank doesn’t change

Tests (T)

Histories (H)

Page 27: Predictive State  Representation Masoumeh Izadi School of Computer Science McGill University

System Dynamics Matrix

P(tj|hi)

t1 t2 tjh1 =ε

hi

h2

All possible extension of tests and histories needs processing a huge matrix in large domains.

Page 28: Predictive State  Representation Masoumeh Izadi School of Computer Science McGill University

Core Test Discovery

t1 t2 h1 h2

One-step histories/ tests

Repeat one-step extensions to Qi till the rank doesn’t change

millions of samples required for a few state problem.

Page 29: Predictive State  Representation Masoumeh Izadi School of Computer Science McGill University

PSR Learning

Structure Learning:

which tests to choose for Q from data

Parameter Learning:

how to tune m-vectors given the structure and experience data

Page 30: Predictive State  Representation Masoumeh Izadi School of Computer Science McGill University

Learning Parameters

PSR Gradient algorithm [Singh et al. 2003]

Principle-Component based algorithm for TPSR (uncontrolled system) [Rosencrantz et al. 20004]

Suffix-History Algorithm [ James et al.2004]

POMDP EM

Page 31: Predictive State  Representation Masoumeh Izadi School of Computer Science McGill University

Results on PSR Model Learning

Page 32: Predictive State  Representation Masoumeh Izadi School of Computer Science McGill University

Planning

States expressed in predictive form.

Planning and reasoning should be in terms of experience.

Rewards treated as part of observations.

Tests are of the form: t=a1(o1r1)….an(onrn).

General POMDP methods (e.g. dynamic programming) can be used.

Page 33: Predictive State  Representation Masoumeh Izadi School of Computer Science McGill University

Predictive Space

1

1

1|Q|=3

P(Q|h) P(Q|hao)

o

P(qi |hao) = p(Q|h)T maoqi /p(Q|h)Tmao

0 ≤ P(qi ) ≤ 1 for all i’s

Page 34: Predictive State  Representation Masoumeh Izadi School of Computer Science McGill University

Forward Search

a2

o1

o1 o2 a1 a2

o2o1

o1 o2 o1 o2

o2

a1

Exponential Complexity

Compare alternative future experiences.

Page 35: Predictive State  Representation Masoumeh Izadi School of Computer Science McGill University

DP for Finite-Horizon POMDPs

The value function for a set of trees is always piecewise linear and convex (PWLC)

p1

p2,

s1, s2,

a2

a3 a3

a3 a1 a2 a1

o1

o1 o2 o1 o2

o2

a1

a2 a3

a3 a2 a1 a1

o1

o1 o2 o1 o2

o2

a1

a1 a2

a2 a2 a3 a3

o1

o1 o2 o1 o2

o2

p1 p2

p3

p3,

Page 36: Predictive State  Representation Masoumeh Izadi School of Computer Science McGill University

Value Iteration in POMDPs

Value iteration: Initialize value function

V(b) = max_a Σ_s R(s,a) b(s) This produces 1 alpha-vector per action.

Compute the value function at the next iteration using Bellman’s equation:

V(b)= max_a [Σ_s R(s,a)b(s)+Σ_s’[T(s,a,s’)O(s’,a,z)α(s’)]]

Page 37: Predictive State  Representation Masoumeh Izadi School of Computer Science McGill University

DP for Finite-Horizon PSRs

Theorem: value function for a finite horizon is still piecewise-linear and convex.

There’s a scaler reward for each test. R(ht,a)= Σ_r prob (r |ht , a)

Value of a policy tree is a linear function of prediction vector.

Vp(p(Q|h)=PT(Q|h)( n_a + Σ_o Mao w)

Page 38: Predictive State  Representation Masoumeh Izadi School of Computer Science McGill University

Value Iteration in PSRs

Value iteration just as in POMDPs V(p(Q|h)) = max _α [Vα(p(Q|h))]

Represent any finite-horizon solution by a finite set of alpha-vectors (policy trees).

Page 39: Predictive State  Representation Masoumeh Izadi School of Computer Science McGill University

Results on PSR Control

James etal.2004

Page 40: Predictive State  Representation Masoumeh Izadi School of Computer Science McGill University

Results on PSR Control

• Current PSR planning algorithms are not advantageous to POMDP planning ([Izadi & Precup 2003], [James et al. 2004]).

• Planning Requires precise definition of predictive space.

• It is important to analyze the impact of PSR planning on structured domains.

Page 41: Predictive State  Representation Masoumeh Izadi School of Computer Science McGill University

Predictive Representations

Linear PSR EPSR action sequence +last observation

[Rudary and Singh 2004]

mPSR augmented with history [James et al 2005]

TD Networks temporal difference learning with network of interrelated predictions [Tanner

and Sutton 2004]

Page 42: Predictive State  Representation Masoumeh Izadi School of Computer Science McGill University

A good state representation should be: compact useful for planning efficiently learnable

Predictive state representation provide a lossless compression which reflects the underlying structure.

PSR generalizes the space and facilitate planning.

Summary

Page 43: Predictive State  Representation Masoumeh Izadi School of Computer Science McGill University

Limitations

Learning and Discovery in PSRs still lack efficient algorithms.

Current algorithms need way too data samples.

Experiments on many ideas can only be done on toy problems so far due to model learning limitation.

Page 44: Predictive State  Representation Masoumeh Izadi School of Computer Science McGill University

Future Work

Theory of PSR and possible extensions

Efficient algorithms for learning predictive models

More on combining temporal abstraction with PSR

More on planning algorithms for PSR and EPSR

Approximation methods are yet to be developed

PSR for continuous systems

Generalization across states in stochastic systems

Non linear PSRs and exponential compression(?)