Markov Decision Processes AIMA: 17.1, 17.2 (excluding 17.2.3), 17.3

Markov Decision Processes

AIMA: 17.1, 17.2 (excluding 17.2.3), 17.3

From utility to optimal policy• The utility function U(s) allows the agent to select

the action that maximizes the expected utility of the subsequent state:

The Bellman equation

Now, if the utility of a state is the expected sum of discounted rewards from that point onwards, then there is a direct relationship between the utility of a state and the utility of its neighbors:

The utility of a state is the immediate reward for that state plus the expected discounted utility of the next state, assuming that the agent chooses the optimal action

𝑈 (𝑠 )=𝑅 (𝑠 )+𝛾 max𝑎∈ 𝐴(𝑠 )

∑𝑠 ′𝑃 (𝑠 ′|𝑠 ,𝑎 )𝑈 (𝑠 ′) the Bellman

equation

The Bellman equation

∑𝑠 ′𝑃 (𝑠 ′|𝑠 ,𝑎 )𝑈 (𝑠 ′)

The value iteration algorithm

For problem with n states, there are n Bellman equations, and n unknowns, however NOT linear

∑𝑠 ′𝑃 (𝑠 ′|𝑠 ,𝑎 )𝑈 (𝑠 ′)

𝑈 𝑖+1(𝑠 )←𝑅 (𝑠 )+𝛾 max𝑎∈𝐴 (𝑠)

∑𝑠′𝑃 (𝑠′|𝑠 ,𝑎 )𝑈 𝑖(𝑠

Start with random U(s), update iteratively Guaranteed to converge to the unique solution

Demo: http://people.cs.ubc.ca/~poole/demos/mdp/vi.html

Policy iteration algorithm

It is possible to get an optimal policy even when the utility function estimate is inaccurate

If one action is clearly better than all others, then the exact magnitude of the utilities on the states involved need not be precise

Compute utilities of states

Compute optimal policy

Compute utilities of states for a given policy

Compute policy for the given state utilities

Value iteration Policy iteration

Policy iteration algorithm

𝜋∗=argmax𝑎∈𝐴 (𝑠 )

∑𝑠′𝑃 (𝑠′|𝑠 ,𝑎 )𝑈 (𝑠 ′)

∑𝑠 ′𝑃 (𝑠 ′|𝑠 ,𝑎 )𝑈 (𝑠 ′)

𝑈 (𝑠 )=𝑅 (𝑠 )+𝛾∑𝑠 ′𝑃 ( 𝑠′|𝑠 ,𝜋𝑖 (𝑠) )𝑈 (𝑠 ′) Linear

equation

Policy evaluation

n linear equations, n unknowns for problem with n states, solved in n cubic time, can also use iterative scheme 8

Summary

Markov decision processes Utility of state sequence Utility of states Value iteration algorithm Policy iteration algorithm

Markov Decision Processes AIMA: 17.1, 17.2 (excluding 17.2.3), 17.3

Documents

AIMA & Scotiabank Global Banking & Markets presents AIMA ... · AIMA Canada is committed to continuing to champion these initiatives, leveraging our local expertise and global leadership

AIMA SASOS ARI

AIMA SASOS

AIMA Journal Q310

AIMA CANADA Handbook

Aima Documentation

AIMA Newsletter 30.4

Aima final

AIMA SASOS E

NOVEMBER 2015 - AIMA

AIMA SASOS AAR

AIMA Competency Mapping

Aima SequenceDiagram Important

AIMA SASOS MARC

Aima journal dls_plr_extract_q3_2011

The AIMA java classes Introduction The AIMA java classesbejar/ia/transpas/lab/aima-eng.pdf · The AIMA java classes Introduction The AIMA java classes It is a java class package that

AIMA catalogue_2010

AIMA SASOS OEMBER

Aima Journal

FEBRUARY 2015 - AIMA