24
Chapter 3: Finite Markov Decision Processes Seungjae Ryan Lee

03. Finite Markov Decision Processes · 2021. 1. 25. · Chapter 3: Finite Markov Decision Processes Seungjae Ryan Lee Simplified, flexible reinforcement learning problem Consists

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 03. Finite Markov Decision Processes · 2021. 1. 25. · Chapter 3: Finite Markov Decision Processes Seungjae Ryan Lee Simplified, flexible reinforcement learning problem Consists

Chapter 3: Finite Markov Decision ProcessesSeungjae Ryan Lee

Page 2: 03. Finite Markov Decision Processes · 2021. 1. 25. · Chapter 3: Finite Markov Decision Processes Seungjae Ryan Lee Simplified, flexible reinforcement learning problem Consists

● Simplified, flexible reinforcement learning problem● Consists of States , Actions , Rewards

Markov Decision Process (MDP)

StatesInfo available to agent

ActionsChoice made by agent

RewardsBasis for evaluating choices

Page 3: 03. Finite Markov Decision Processes · 2021. 1. 25. · Chapter 3: Finite Markov Decision Processes Seungjae Ryan Lee Simplified, flexible reinforcement learning problem Consists

Agent

The learnerTakes action

Everything outside the agentReturns state and reward

Environment

Page 4: 03. Finite Markov Decision Processes · 2021. 1. 25. · Chapter 3: Finite Markov Decision Processes Seungjae Ryan Lee Simplified, flexible reinforcement learning problem Consists

● Anything the agent cannot arbitrarily change is part of the environment○ Agent might still know everything about the environment

● Different boundaries for different purposes

Agent-Environment Boundary

Machinery Sensors Battery“Brain”

Page 5: 03. Finite Markov Decision Processes · 2021. 1. 25. · Chapter 3: Finite Markov Decision Processes Seungjae Ryan Lee Simplified, flexible reinforcement learning problem Consists

1. Agent observes a state2. Agent takes action3. Agent receives reward and new state4. Agent takes another action 5. Repeat

Agent-Environment Interactions

Page 6: 03. Finite Markov Decision Processes · 2021. 1. 25. · Chapter 3: Finite Markov Decision Processes Seungjae Ryan Lee Simplified, flexible reinforcement learning problem Consists

Transition Probability● Probability of reaching state and reward by taking action on state● Fully describes the dynamics of a finite MDP

● Can deduce other properties of the environment

Page 7: 03. Finite Markov Decision Processes · 2021. 1. 25. · Chapter 3: Finite Markov Decision Processes Seungjae Ryan Lee Simplified, flexible reinforcement learning problem Consists

Expected Rewards● Expected reward of taking action on state

● Expected reward of arriving in state by taking action on state

Page 8: 03. Finite Markov Decision Processes · 2021. 1. 25. · Chapter 3: Finite Markov Decision Processes Seungjae Ryan Lee Simplified, flexible reinforcement learning problem Consists

Recycling Robot Example● States: Battery status (high or low)● Actions

○ Search: High reward. Battery status can be lowered or depleted.○ Wait: Low reward. Battery status does not change.○ Recharge: No reward. Battery status changed to high.

● If battery is depleted, -3 reward and battery status changed to high.

Page 9: 03. Finite Markov Decision Processes · 2021. 1. 25. · Chapter 3: Finite Markov Decision Processes Seungjae Ryan Lee Simplified, flexible reinforcement learning problem Consists

Transition Graph● Graphical summary of MDP dynamics

Page 10: 03. Finite Markov Decision Processes · 2021. 1. 25. · Chapter 3: Finite Markov Decision Processes Seungjae Ryan Lee Simplified, flexible reinforcement learning problem Consists

Designing Rewards● Reward hypothesis

○ Goals and purposes can be represented by maximization of cumulative reward

● Tell what you want to achieve, not how

+1 for each boxProportional to forward action

Always -1

Page 11: 03. Finite Markov Decision Processes · 2021. 1. 25. · Chapter 3: Finite Markov Decision Processes Seungjae Ryan Lee Simplified, flexible reinforcement learning problem Consists

Episodic Tasks● Interactions can be broken into episodes● Episodes end in a special terminal state● Each episode is independent

Finished when the game ends Finished when the agent is out of the maze

Page 12: 03. Finite Markov Decision Processes · 2021. 1. 25. · Chapter 3: Finite Markov Decision Processes Seungjae Ryan Lee Simplified, flexible reinforcement learning problem Consists

Return for Episodic Tasks● Sum of rewards from time step ● Time of termination:

Page 13: 03. Finite Markov Decision Processes · 2021. 1. 25. · Chapter 3: Finite Markov Decision Processes Seungjae Ryan Lee Simplified, flexible reinforcement learning problem Consists

Continuing Tasks● Cannot be naturally broken into episodes● Goes on without limit

Stock Trading

Page 14: 03. Finite Markov Decision Processes · 2021. 1. 25. · Chapter 3: Finite Markov Decision Processes Seungjae Ryan Lee Simplified, flexible reinforcement learning problem Consists

Return for Continuing Tasks● Sum of rewards is almost always infinite● Need to discount future rewards by factor

○ If , the return only considers immediate reward (myopic)

Page 15: 03. Finite Markov Decision Processes · 2021. 1. 25. · Chapter 3: Finite Markov Decision Processes Seungjae Ryan Lee Simplified, flexible reinforcement learning problem Consists

Unified Notation for Return● Cumulative reward● can be a finite number or infinity● Future rewards can be discounted with factor

○ If , then must be less than 1.

Page 16: 03. Finite Markov Decision Processes · 2021. 1. 25. · Chapter 3: Finite Markov Decision Processes Seungjae Ryan Lee Simplified, flexible reinforcement learning problem Consists

Policy● Mapping from states to probabilities of selecting each possible action● : Probability of selecting action in state

Page 17: 03. Finite Markov Decision Processes · 2021. 1. 25. · Chapter 3: Finite Markov Decision Processes Seungjae Ryan Lee Simplified, flexible reinforcement learning problem Consists

State-value function● Expected return from state and following policy

Page 18: 03. Finite Markov Decision Processes · 2021. 1. 25. · Chapter 3: Finite Markov Decision Processes Seungjae Ryan Lee Simplified, flexible reinforcement learning problem Consists

Action-value function● Expected return from taking action in state and following policy

Page 19: 03. Finite Markov Decision Processes · 2021. 1. 25. · Chapter 3: Finite Markov Decision Processes Seungjae Ryan Lee Simplified, flexible reinforcement learning problem Consists

Bellman Equation● Recursive relationship between and

Page 20: 03. Finite Markov Decision Processes · 2021. 1. 25. · Chapter 3: Finite Markov Decision Processes Seungjae Ryan Lee Simplified, flexible reinforcement learning problem Consists

Optimal Policies and Value Functions● For any policy , for all states● There can be multiple optimal policies● All optimal policies share same optimal value functions:

Page 21: 03. Finite Markov Decision Processes · 2021. 1. 25. · Chapter 3: Finite Markov Decision Processes Seungjae Ryan Lee Simplified, flexible reinforcement learning problem Consists

Bellman Optimality Equation● Bellman Equation for optimal policies

Page 22: 03. Finite Markov Decision Processes · 2021. 1. 25. · Chapter 3: Finite Markov Decision Processes Seungjae Ryan Lee Simplified, flexible reinforcement learning problem Consists

Solving Bellman Optimality Equation● Linear system: equations, unknowns● Possible to find the exact optimal policy● Impractical in most environments

○ Need to know the dynamics of the environment○ Need extreme computational power○ Need Markov property

→ In most cases, approximation is the best possible solution.

Page 23: 03. Finite Markov Decision Processes · 2021. 1. 25. · Chapter 3: Finite Markov Decision Processes Seungjae Ryan Lee Simplified, flexible reinforcement learning problem Consists

Approximation● Does not require complete knowledge of environment● Less memory and computational power needed● Can focus learning on frequently encountered states

Page 24: 03. Finite Markov Decision Processes · 2021. 1. 25. · Chapter 3: Finite Markov Decision Processes Seungjae Ryan Lee Simplified, flexible reinforcement learning problem Consists

Thank you!Original content from

● Reinforcement Learning: An Introduction by Sutton and Barto

You can find more content in

● github.com/seungjaeryanlee ● www.endtoend.ai