Stochastic Dynamic Programming with Factored Representations Presentation by Dafna Shahaf...

Stochastic Dynamic Programming with Factored Representations

Presentation by Dafna Shahaf(Boutilier, Dearden, Goldszmidt 2000)

The Problem Standard MDP algorithms require explicit

state space enumeration Curse of dimensionality Need: Compact Representation

(intuition: STRIPS) Need: versions of standard dynamic

programming algorithms for it

A Glimpse of the Future

Policy Tree Value Tree

A Glimpse of the Future: Some Experimental Results

Roadmap

MDPs- Reminder Structured Representation for MDPs:

Bayesian Nets, Decision Trees Algorithms for Structured Representation Experimental Results Extensions

MDPs- Reminder

(states, actions, transitions, rewards)

Discounted infinite-horizon Stationary Policies

(an action to take at state s) Value functions: is k-stage-to-go

value function for π)(sV k

RTAS ,,,

Roadmap

Representing MDPs as Bayesian Networks: Coffee world

O: Robot is in office W: Robot is wet U: Has umbrella R: It is raining HCR: Robot has coffee HCO: Owner has coffee

Go: Switch location BuyC: Buy coffee DelC: Deliver coffee GetU: Get umbrella

The effect of the actions might be noisy.Need to provide a distribution for each effect.

Representing Actions: DelC

00.300

Representing Actions: Interesting Points

No need to provide marginal distribution over pre-action variables

Markov Property: we need only the previous state For now, no synchronic arcs Frame Problem? Single Network vs. a network for each action Why Decision Trees?

Representing Reward

Generally determined by a subset of features.

Policies and Value Functions

Policy Tree Value Tree

The optimal choice may depend only on certain variables (given some others).

FeaturesHCR=T

ValuesActions

Roadmap

Bellman Backup

Q-Function: The value of performing a in s, given value function v

Value Iteration- Reminder

)'(' )',,'Pr()()( ss vsassRsQva

)(max:)(max)(1

sQsQsV kaa

)}'(' ),,'Pr({max)()( 1 ss VsassRsV ka

Structured Value Iteration- OverviewInput: Tree( ). Output: Tree( ).

1. Set Tree( )= Tree( )

2. Repeat

(a) Compute Tree( )= Regress(Tree( ),a)

for each action a

(b) Merge (via maximization) trees Tree( )

to obtain Tree( )

Until termination criterion. Return Tree( ).

1kV1kV

Example World

Step 2a: Calculating Q-Functions

)'(' )',,Pr()()( ss VsassRsQVa

1. Expected FutureValue

2. DiscountingFutureValue

3. AddingImmediate

Reward

How to use the structure of the trees?

Tree( ) should distinguish only conditions under which a makes a branch of Tree(V) true with different odds.

Calculating :

Tree(V0)

PTree( )

Finding conditions under which a will have distinct expected value, with respect to V0

1aQ FVTree( )1aQ

Undiscounted Expected Future Value for performing action a with one-stage-to-go.

Tree( )1aQ

Discounting FVTree (by 0.9), and adding the immediate reward function.

1*10+0*0

An Alternative View:

(a more complicated example)

Tree(V1) PartialPTree( )

UnsimplifiedPTree( )

PTree( )

2aQ FVTree( )2aQ Tree( )

The Algorithm: Regress

Input: Tree(V), action a. Output: Tree( )

1. PTree( )= PRegress(Tree(V),a) (simplified)

2. Construct FVTree( ):

for each branch b of PTree, with leaf node l(b)

(a) Prb =the product of individual distr. from l(b)

(c) Re-label leaf l(b) with vb.

)(')'()'(Pr

VTreeb

bb bVbv

2. Construct FVTree( ):

for each branch b of PTree, with leaf node l(b)

(a) Prb =the product of individual distr. from l(b)

(c) Re-label leaf l(b) with vb.

3. Discount FVTree( ) with , append Tree(R)

4. Return FVTree( )

)(')'()'(Pr

VTreeb

bb bVbv

The Algorithm: PRegressInput: Tree(V), action a. Output: PTree( )

1. If Tree(V) is a single node, return emptyTree

2. X = the variable at the root of Tree(V)

= the tree for CPT(X) (label leaves with X)

3. = the subtrees of Tree(V) for X=t, X=f

4. = call PRegress on

VtX TT ,

PtX TT ,

VtX TT ,

3. = the subtrees of Tree(V) for X=t, X=f

4. = call PRegress on

5. For each leaf l in , add or both (according to distribution. Use union to combine labels)

6. Return

VtX TT ,

PtX TT ,

VtX TT ,

PtX TT ,

Step 2b. Maximization

Value Iteration Complete.

Roadmap

Experimental Results

WorstCase:

BestCase:

Roadmap

Extensions

Synchronic edges POMDPs Rewards Approximation

Questions?

Backup slides

Here be dragons.

Regression through a Policy

Improving Policies: Example

Maximization Step, Improved Policy

Stochastic Dynamic Programming with Factored Representations Presentation by Dafna Shahaf...

Documents

Uncertainty Principles for Fourier Multipliers...Uncertainty Principles for Fourier Multipliers Michael Northington V School of Mathematics Georgia Tech 6/6/2018 With Shahaf Nitzan

Good Distribution Practices Awareness training Masterclass GDP Antwerpen 5 March 2015 Claude Goldszmidt Senior IT & Equipment Compliance Consultant and

Metro Maps of Science - huji.ac.ildshahaf/kdd2012-shahaf-guestr...tates users’ knowledge acquisition and comprehension of the frontier and evolutionary history of ideas in a discipline

Connecting the Dots Between News Articles Dafna Shahaf and Carlos Guestrin

Ircam - Centre Pompidou - Audio/Video - Ressources€¦ · 3 ORGANISING COMMITTEE Sylvie Benoit — IRCAM Nicolas Donin — IRCAM Samuel Goldszmidt — IRCAM Hyacinthe Ravet — Univ

The Aha! Moment: From Data to Insight Dafna Shahaf Joint work with Carlos Guestrin, Eric Horvitz, Jure Leskovec

Presenting: Dafna Shahaf. Infranet: Circumventing Web Censorship and Surveillance Nick Feamster, Magdalena Balazinska, Greg Harfst, Hari Balakrishnan,

Book Review: Mission, Church, and Sect in Oceania. J.A. Boutilier,

Mid Semester A Project Presentation Instructor: Mr. Almog Assaf Real Time Image Processing Presented by: Baruch Koren Shahaf Fisher Technion – Israel

Minimizing Toxicity in the Work Environment Chelsye Bond, Monique Boutilier, Jennifer Fougere, & Jessica MacLean

How Dynamic are IP Addresses? - SIGCOMMccr.sigcomm.org/online/files/fp179-xie.pdfHow Dynamic are IP Addresses? Yinglian Xie, Fang Yu, Kannan Achan Eliot Gillum+, Moises Goldszmidt,

PGM: Tirgul 11 Na?ve Bayesian Classifier + Tree Augmented Na?ve Bayes (adapted from tutorial by Nir Friedman and Moises Goldszmidt

Strengthening the Security of Encrypted Databases: Non ... · Strengthening the Security of Encrypted Databases: Non-Transitive JOINs Ilya Mironov Gil Segev yIdo Shahaf ... SQL's

Descendants of Anna Catherine Boutilier and Jean George ... of... · Descendants of Anna Catherine Boutilier and Jean George Besancon "Bezanson" 1. Jean George1 Besancon "Bezanson",

A Game-theoretic Analysis of Catalog Optimization JOEL OREN, UNIVERSITY OF TORONTO. JOINT WORK WITH: NINA NARODYTSKA, AND CRAIG BOUTILIER 1

Descendants of Anna Catherine Boutilier and Jean George ......Descendants of Anna Catherine Boutilier and Jean George Besancon "Bezanson" 1. Jean George1 Besancon "Bezanson", born

1 Symbolic Dynamic Programming Alan Fern * * Based in part on slides by Craig Boutilier

© 1998, Nir Friedman, U.C. Berkeley, and Moises Goldszmidt, SRI International. All rights reserved. Learning I Excerpts from Tutorial at:

Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz

Vote Elicitation with Probabilistic Preference Models: Empirical Estimation and Cost Tradeoffs Tyler Lu and Craig Boutilier University of Toronto