79
Universi ty Paderbor n 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Embed Size (px)

Citation preview

Page 1: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

UniversityPaderbor

n

16 January 2009

RG Knowledge Based Systems

Hans Kleine Büning

Reinforcement LearningReinforcement Learning

Page 2: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 2

UniversityPaderbor

n

OutlineOutline

• Motivation• Applications• Markov Decision Processes• Q-learning• Examples

Page 3: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 3

UniversityPaderbor

n

Page 4: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 4

UniversityPaderbor

n

Reinforcement Learning: The Idea

• A way of programming agents by reward and punishment without specifying how the task is to be achieved

Page 5: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 5

UniversityPaderbor

n

Learning to Ride a Bicycle

Environment

Environment

state

action

€€€€€€

Page 6: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 6

UniversityPaderbor

n

Learning to Ride a Bicycle

• States:– Angle of handle bars

– Angular velocity of handle bars

– Angle of bicycle to vertical

– Angular velocity of bicycle to vertical

– Acceleration of angle of bicycle to vertical

Page 7: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 7

UniversityPaderbor

n

Learning to Ride a Bicycle

Environment

Environment

state

action

€€€€€€

Page 8: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 8

UniversityPaderbor

n

Learning to Ride a Bicycle

• Actions:– Torque to be applied to the

handle bars

– Displacement of the center of mass from the bicycle’s plan (in cm)

Page 9: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 9

UniversityPaderbor

n

Learning to Ride a Bicycle

Environment

Environment

state

action

€€€€€€

Page 10: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 10

UniversityPaderbor

n

Angle of bicycle to vertical is greater

than 12°

Reward = 0

Reward = -1

no yes

Page 11: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 11

UniversityPaderbor

n

Learning To Ride a Bicycle

Reinforcement Learning

Page 12: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 12

UniversityPaderbor

n

Reinforcement Learning: Applications

• Board Games– TD-Gammon program, based on reinforcement learning, has

become a world-class backgammon player

• Mobile Robot Controlling– Learning to Drive a Bicycle– Navigation– Pole-balancing– Acrobot

• Sequential Process Controlling– Elevator Dispatching

Page 13: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 13

UniversityPaderbor

n

Key Features of Reinforcement Learning

• Learner is not told which actions to take• Trial and error search• Possibility of delayed reward:

– Sacrifice of short-term gains for greater long-term gains

• Explore/Exploit trade-off• Considers the whole problem of a goal-directed

agent interacting with an uncertain environment

Page 14: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 14

UniversityPaderbor

n

The Agent-Environment Interaction

• Agent and environment interact at discrete time steps: t = 0,1, 2, …– Agent observes state at step t :

st 2 S

– produces action at step t: at 2 A

– gets resulting reward : rt +1 2 ℜ

– and resulting next state: st +1 2 S

Page 15: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 15

UniversityPaderbor

n

The Agent’s Goal:

• Coarsely, the agent’s goal is to get as much reward as it

can over the long run

Policy is• a mapping from states to action s) = a

• Reinforcement learning methods specify how the agent changes its policy as a result of experience experience

Page 16: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 16

UniversityPaderbor

n

Deterministic Markov Decision Process

Page 17: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 17

UniversityPaderbor

n

Example

Page 18: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 18

UniversityPaderbor

n

Example: Corresponding MDP

Page 19: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 19

UniversityPaderbor

n

Example: Corresponding MDP

Page 20: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 20

UniversityPaderbor

n

Example: Corresponding MDP

Page 21: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 21

UniversityPaderbor

n

Example: Policy

Page 22: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 22

UniversityPaderbor

n

Value of Policy and Rewards

Page 23: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 23

UniversityPaderbor

n

Value of Policy and Agent’s Task

Page 24: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 24

UniversityPaderbor

n

Nondeterministic Markov Decision Process

P = 0

.8

P = 0.1

P = 0.1

Page 25: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 25

UniversityPaderbor

n

Nondeterministic Markov Decision Process

Page 26: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 26

UniversityPaderbor

n

Nondeterministic Markov Decision Process

Page 27: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 27

UniversityPaderbor

n

Example with South-Easten Wind

Page 28: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 28

UniversityPaderbor

n

Example with South-Easten Wind

Page 29: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 29

UniversityPaderbor

n

Methods

Dynamic Programming

ValueFunction

Approximation+

DynamicProgramming

ReinforcementLearning,

Monte Carlo Methods

ValuationFunction

Approximation+

ReinforcementLearning

continuousstates

discrete states discrete statescontinuous

states

Model (reward function and transitionprobabilities) is known

Model (reward function or transitionprobabilities) is unknown

Page 30: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 30

UniversityPaderbor

n

Q-learning Algorithm

Page 31: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 31

UniversityPaderbor

n

Q-learning Algorithm

Page 32: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 32

UniversityPaderbor

n

Example

Page 33: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 33

UniversityPaderbor

n

Example: Q-table Initialization

Page 34: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 34

UniversityPaderbor

n

Example: Episode 1

Page 35: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 35

UniversityPaderbor

n

Example: Episode 1

Page 36: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 36

UniversityPaderbor

n

Example: Episode 1

Page 37: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 37

UniversityPaderbor

n

Example: Episode 1

Page 38: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 38

UniversityPaderbor

n

Example: Episode 1

Page 39: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 39

UniversityPaderbor

n

Example: Q-table

Page 40: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 40

UniversityPaderbor

n

Example: Episode 1

Page 41: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 41

UniversityPaderbor

n

Episode 1

Page 42: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 42

UniversityPaderbor

n

Example: Q-table

Page 43: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 43

UniversityPaderbor

n

Example: Episode 2

Page 44: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 44

UniversityPaderbor

n

Example: Episode 2

Page 45: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 45

UniversityPaderbor

n

Example: Episode 2

Page 46: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 46

UniversityPaderbor

n

Example: Q-table after Convergence

Page 47: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 47

UniversityPaderbor

n

Example: Value Function after Convergence

Page 48: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 48

UniversityPaderbor

n

Example: Optimal Policy

Page 49: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 49

UniversityPaderbor

n

Example: Optimal Policy

Page 50: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 50

UniversityPaderbor

n

Q-learning

Page 51: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 51

UniversityPaderbor

n

Convergence of Q-learning

Page 52: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 52

UniversityPaderbor

n

Blackjack• Standard rules of blackjack hold• State space:

– element[0] - current value of player's hand (4-21)

– element[1] - value of dealer's face -up card (2-11)

– element[2] - player does not have usable ace (0/1)

• Starting states:– player has any 2 cards (uniformly

distributed), dealer has any 1 card (uniformly distributed)

• Actions: – HIT– STICK

• Rewards: – 1 for a loss– 0 for a draw– 1 for a win

Page 53: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 53

UniversityPaderbor

n

Blackjack: Optimal Policy

Page 54: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 54

UniversityPaderbor

n

Reinforcement Learning: Example

• States– Grids

• Actions– Left– Up– Right– Down

• Rewards– Bonus 20– Food 1– Predator -10– Empty grid -0.1

• Transition probabilities– 0.80 – agent goes where he

intends to go– 0.20 – to any other adjacent

grid or remains where it was (in case he is on the board of the grid world he goes to the other side)

Page 55: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 55

UniversityPaderbor

n

Reinforcement Learning: Example

Page 56: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 56

UniversityPaderbor

n

Reinforcement Learning: Example

Page 57: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 57

UniversityPaderbor

n

Reinforcement Learning: Example

Page 58: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 58

UniversityPaderbor

n

Reinforcement Learning: Example

Page 59: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 59

UniversityPaderbor

n

Reinforcement Learning: Example

Page 60: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 60

UniversityPaderbor

n

Reinforcement Learning: Example

Page 61: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 61

UniversityPaderbor

n

Reinforcement Learning: Example

Page 62: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 62

UniversityPaderbor

n

Reinforcement Learning: Example

Page 63: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 63

UniversityPaderbor

n

Reinforcement Learning: Example

Page 64: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 64

UniversityPaderbor

n

Reinforcement Learning: Example

Page 65: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 65

UniversityPaderbor

n

Reinforcement Learning: Example

Page 66: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 66

UniversityPaderbor

n

Reinforcement Learning: Example

Page 67: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 67

UniversityPaderbor

n

Reinforcement Learning: Example

Page 68: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 68

UniversityPaderbor

n

Reinforcement Learning: Example

Page 69: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 69

UniversityPaderbor

n

Reinforcement Learning: Example

Page 70: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 70

UniversityPaderbor

n

Reinforcement Learning: Example

Page 71: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 71

UniversityPaderbor

n

Reinforcement Learning: Example

Page 72: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 72

UniversityPaderbor

n

Reinforcement Learning: Example

Page 73: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 73

UniversityPaderbor

n

Reinforcement Learning: Example

Page 74: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 74

UniversityPaderbor

n

Reinforcement Learning: Example

Page 75: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 75

UniversityPaderbor

n

Reinforcement Learning: Example

Page 76: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 76

UniversityPaderbor

n

Reinforcement Learning: Example

Page 77: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 77

UniversityPaderbor

n

Reinforcement Learning: Example

Page 78: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 78

UniversityPaderbor

n

Reinforcement Learning: Example

Page 79: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 79

UniversityPaderbor

n

Reinforcement Learning: Example