22
Efficient Solution Algorithms for Factored MDPs by Carlos Guestrin, Daphne Koller, Ronald Parr, Shobha Venkataraman Presented by Arkady Epshteyn

Efficient Solution Algorithms for Factored MDPs

  • Upload
    elvis

  • View
    38

  • Download
    3

Embed Size (px)

DESCRIPTION

Efficient Solution Algorithms for Factored MDPs. by Carlos Guestrin, Daphne Koller, Ronald Parr, Shobha Venkataraman. Presented by Arkady Epshteyn. Problem with MDPs. Exponential number of states Example: Sysadmin Problem 4 computers: M 1 , M 2 , M 3 , M 4 - PowerPoint PPT Presentation

Citation preview

Page 1: Efficient Solution Algorithms for Factored MDPs

Efficient Solution Algorithms for Factored MDPs

by Carlos Guestrin, Daphne Koller, Ronald Parr, Shobha Venkataraman

Presented by Arkady Epshteyn

Page 2: Efficient Solution Algorithms for Factored MDPs

Problem with MDPs

• Exponential number of states• Example: Sysadmin Problem

• 4 computers: M1, M2 , M3 , M4

• Each machine is working or has failed.• State space: 24

• 8 actions: whether to reboot each machine or not• Reward: depends on the number of working

machines

Page 3: Efficient Solution Algorithms for Factored MDPs

Factored Representation

• Transition model: DBN• Reward model:

k

j

j xrxR1

)()(

Page 4: Efficient Solution Algorithms for Factored MDPs

Approximate Value Function

• Linear value function:

• Basis functions:

hi(Xi=true)=1

hi(Xi=false)=0

h0=1

k

j

jj xhwxV1

)()(

Page 5: Efficient Solution Algorithms for Factored MDPs

Markov Decision Processes

'

)( )'()|'()()(x

x xVxxPxRxV For fixed policy :

The optimal value function V*:

])'(*)|'()([max)(*'

x

aaa

xVxxPxRxV

Page 6: Efficient Solution Algorithms for Factored MDPs

Solving MDPMethod 1: Policy Iteration

• Value determination

• Policy Improvement

'

)()( )'()|'()()(x

txx

t xVxxPxRxV

•Polynomial in the number of states N•Exponential in the number of variables K

])'()|'()([maxarg)('

1

x

taa

a

t xVxxPxRx

Page 7: Efficient Solution Algorithms for Factored MDPs

Solving MDPMethod 2: Linear Programming

Intuition: compare with the fixed point of V(x):

axVxxPxRVtoSubject

xiVxMinimize

VVVariables

i

j

jijaai

i

x

ii

N

i

,,)|()(:

0)(:,)(:

,...,: 1

•Polynomial in the number of states N•Exponential in the number of variables

])'(*)|'()([max)(*'

x

aaa

xVxxPxRxV

Page 8: Efficient Solution Algorithms for Factored MDPs

Value Function Approximation

axxhwxxPxRxhwtoSubject

xixhwxMinimize

wwVariables

i

ii

x

aa

i

ii

x

k

i

ii

K

,,)'()|()()(:

0)(:,)()(:

,...,:

'

'

1

1

axVxxPxRVtoSubject

xiVxMinimize

VVVariables

i

j

jijaai

i

x

ii

N

i

,,)|()(:

0)(:,)(:

,...,: 1

Page 9: Efficient Solution Algorithms for Factored MDPs

Objective function

axxhwxxPxRxhwtoSubject

xixhwxMinimize

wwVariables

i

ii

x

aa

i

ii

i

x i

ii

K

,,)'()|()()(:

0)(:,)()(:

,...,:

'

'

1

•Objective function polynomial in the number of basis functions

i

i

Cx

i

i

ii

c

ii

i

i

x

i

x i

ii

xcwhere

chcw

xhxw

xhwx

)()(

,)()(

)()(

)()(

Page 10: Efficient Solution Algorithms for Factored MDPs

Each Constraint: Backprojection

axxhwxxPxRxhwtoSubject

xixhwxMinimize

wwVariables

i

ii

x

aa

i

ii

i

x i

ii

K

,,)'()|()()(:

0)(:,)()(:

,...,:

'

'

1

i

i

x

ai

i

ii

x

a xhxxPwxhwxxP )'()|()'()|('

'

'

'

))(|(

)|(

)|'(

iii

ii

i

cpacEh

xcEh

xxEh

Page 11: Efficient Solution Algorithms for Factored MDPs

Representing Exponentially Many Constraints

axxhwxxPxRxhwtoSubject

xixhwxMinimize

wwVariables

i

ii

x

aa

i

ii

i

x i

ii

K

,,)'()|()()(:

0)(:,)()(:

,...,:

'

'

1

axRxhxhxxPw

axxRxhxhxxPw

axxhwxxPxRxhw

a

i

ii

x

aix

a

i

ii

x

ai

i

ii

x

aa

i

ii

),()]()'()|([max0

,),()]()'()|([0

,,)'()|()()(

'

'

'

'

'

'

Page 12: Efficient Solution Algorithms for Factored MDPs

Restricted Domain

i j

jiix

a

i

iaii

x

a

i

ii

x

aix

xrxfw

xRxhxgw

axRxhxhxxPw

)()(max

)()]()([max

),()]()'()|([max0'

'

1. Backprojection - depends on few variables2. Basis function3. Reward function

1 2 3

Page 13: Efficient Solution Algorithms for Factored MDPs

Variable Elimination

)],(),([max),(

)],(),(),([max

)]],(),([max),(),([max

),(),(),(),(max

)()(max

4324214

321

321312221113,2,1

4324214

312221113,2,1

432421312221114,3,2,1

xxrxxrxxewhere

xxexxfwxxfw

xxrxxrxxfwxxfw

xxrxxrxxfwxxfw

xrxfw

x

xxx

xxxx

xxxx

i j

jiix

- similar to Bayesian Networks

Page 14: Efficient Solution Algorithms for Factored MDPs

Maximization as Linear Constraints

...

),(),(),(

),(),(),(

),(),(),(

),(),(),(

:sconstrainttoEquivalent

)],(),([max),(

432421321

432421321

432421321

432421321

4324214

321

xxrxxrxxe

xxrxxrxxe

xxrxxrxxe

xxrxxrxxe

xxrxxrxxex

• Exponential in the size of each function’s domain, not the number of states

Page 15: Efficient Solution Algorithms for Factored MDPs

Factored LP: Scaling

Page 16: Efficient Solution Algorithms for Factored MDPs

Rule-based Representation

Page 17: Efficient Solution Algorithms for Factored MDPs

Approximate Value Function

k

j hRule

ij

k

j

jj

k

j

jj

ji

xxxxRulew

xxxxhwxhwxV

1

4321

1

4321

1

),,,(

),,,()()(

x1

x30

5 0.6

h1:

6.0:,:

5:,:

0::

313

312

11

xxRule

xxRule

xRule

Notice: compact representation (2/4 variables, 3/16 rules)

Page 18: Efficient Solution Algorithms for Factored MDPs

Summing Over Rules

k

j hRule

ij

ji

xxxxRulewxV1

4321 ),,,()(

x1

x3u1

u2 u3

h1(x)

x2

x1u4

u5

h2(x)

+

u6

=

x2

x1

u1+u4

u2+u6 u3+u6

x1

x3 x3u5+u1

u2+u4 u3+u4

Page 19: Efficient Solution Algorithms for Factored MDPs

Multiplying over Rules

• Analogous construction

axRxhxhxxPw a

i

ii

x

aix

),()]()'()|([max0'

'

Page 20: Efficient Solution Algorithms for Factored MDPs

Rule-based MaximizationaxRxhxhxxPw a

i

ii

x

aix

),()]()'()|([max0'

'

x1

x2u1

u2 x3

u3 u4

Eliminate x2

x1

x3u1

max(u2,u3) max(u2,u4)

Page 21: Efficient Solution Algorithms for Factored MDPs

Rule-based Linear Program

• Backprojection, objective function – handled in a similar way

• All the operations (summation, multiplication, maximization) – keep rule representation intact

• is a linear function ji hRule

ij xxxxRulew ),,,( 4321

Page 22: Efficient Solution Algorithms for Factored MDPs

Conclusions

• Compact representation can be exploited to solve MDPs with exponentially many states efficiently.

• Still NP-complete in the worst case.• Factored solution may increase the size of LP

when the number of states is small (but it scales better).

• Success depends on the choice of the basis functions for value approximation and the factored decomposition of rewards and transition probabilities.