58
- 1 TABULAR METHODS & VALUE APPROXIMATION REVIEW Lecture 18 CSE4/510: Reinforcement Learning October 24, 2019

TABULAR METHODS & VALUE APPROXIMATION REVIEWavereshc/rl_fall19/lecture_18_Tabular... · 1 TABULAR METHODS & VALUE APPROXIMATION REVIEW Lecture 18 CSE4/510: Reinforcement Learning

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: TABULAR METHODS & VALUE APPROXIMATION REVIEWavereshc/rl_fall19/lecture_18_Tabular... · 1 TABULAR METHODS & VALUE APPROXIMATION REVIEW Lecture 18 CSE4/510: Reinforcement Learning

‘-

1

TABULAR METHODS & VALUE APPROXIMATION REVIEWLecture 18

CSE4/510: Reinforcement Learning

October 24, 2019

Page 2: TABULAR METHODS & VALUE APPROXIMATION REVIEWavereshc/rl_fall19/lecture_18_Tabular... · 1 TABULAR METHODS & VALUE APPROXIMATION REVIEW Lecture 18 CSE4/510: Reinforcement Learning

‘-

2

MDP

Page 3: TABULAR METHODS & VALUE APPROXIMATION REVIEWavereshc/rl_fall19/lecture_18_Tabular... · 1 TABULAR METHODS & VALUE APPROXIMATION REVIEW Lecture 18 CSE4/510: Reinforcement Learning

‘-

3

MDP

Page 4: TABULAR METHODS & VALUE APPROXIMATION REVIEWavereshc/rl_fall19/lecture_18_Tabular... · 1 TABULAR METHODS & VALUE APPROXIMATION REVIEW Lecture 18 CSE4/510: Reinforcement Learning

‘-

4

MDP

Page 5: TABULAR METHODS & VALUE APPROXIMATION REVIEWavereshc/rl_fall19/lecture_18_Tabular... · 1 TABULAR METHODS & VALUE APPROXIMATION REVIEW Lecture 18 CSE4/510: Reinforcement Learning

‘-

5

MDP

Page 6: TABULAR METHODS & VALUE APPROXIMATION REVIEWavereshc/rl_fall19/lecture_18_Tabular... · 1 TABULAR METHODS & VALUE APPROXIMATION REVIEW Lecture 18 CSE4/510: Reinforcement Learning

‘-

6

MDP

Page 7: TABULAR METHODS & VALUE APPROXIMATION REVIEWavereshc/rl_fall19/lecture_18_Tabular... · 1 TABULAR METHODS & VALUE APPROXIMATION REVIEW Lecture 18 CSE4/510: Reinforcement Learning

‘-

7

MDP

Page 8: TABULAR METHODS & VALUE APPROXIMATION REVIEWavereshc/rl_fall19/lecture_18_Tabular... · 1 TABULAR METHODS & VALUE APPROXIMATION REVIEW Lecture 18 CSE4/510: Reinforcement Learning

‘-

8

MDP

Page 9: TABULAR METHODS & VALUE APPROXIMATION REVIEWavereshc/rl_fall19/lecture_18_Tabular... · 1 TABULAR METHODS & VALUE APPROXIMATION REVIEW Lecture 18 CSE4/510: Reinforcement Learning

‘-

9

MDP

1.

2.

3.

4.

5.

A

B

C

D

E

F

Page 10: TABULAR METHODS & VALUE APPROXIMATION REVIEWavereshc/rl_fall19/lecture_18_Tabular... · 1 TABULAR METHODS & VALUE APPROXIMATION REVIEW Lecture 18 CSE4/510: Reinforcement Learning

‘-

10

MDP

1.

2.

3.

4.

5.

A

B

C

D

E

F

Page 11: TABULAR METHODS & VALUE APPROXIMATION REVIEWavereshc/rl_fall19/lecture_18_Tabular... · 1 TABULAR METHODS & VALUE APPROXIMATION REVIEW Lecture 18 CSE4/510: Reinforcement Learning

‘-

11

Dynamic Programming

Page 12: TABULAR METHODS & VALUE APPROXIMATION REVIEWavereshc/rl_fall19/lecture_18_Tabular... · 1 TABULAR METHODS & VALUE APPROXIMATION REVIEW Lecture 18 CSE4/510: Reinforcement Learning

‘-

12

Dynamic Programming

Page 13: TABULAR METHODS & VALUE APPROXIMATION REVIEWavereshc/rl_fall19/lecture_18_Tabular... · 1 TABULAR METHODS & VALUE APPROXIMATION REVIEW Lecture 18 CSE4/510: Reinforcement Learning

‘-

13

Dynamic Programming

Page 14: TABULAR METHODS & VALUE APPROXIMATION REVIEWavereshc/rl_fall19/lecture_18_Tabular... · 1 TABULAR METHODS & VALUE APPROXIMATION REVIEW Lecture 18 CSE4/510: Reinforcement Learning

‘-

14

1. Policy Evaluation

2. Policy Improvement

A

B

Dynamic Programming

Page 15: TABULAR METHODS & VALUE APPROXIMATION REVIEWavereshc/rl_fall19/lecture_18_Tabular... · 1 TABULAR METHODS & VALUE APPROXIMATION REVIEW Lecture 18 CSE4/510: Reinforcement Learning

‘-

15

Page 16: TABULAR METHODS & VALUE APPROXIMATION REVIEWavereshc/rl_fall19/lecture_18_Tabular... · 1 TABULAR METHODS & VALUE APPROXIMATION REVIEW Lecture 18 CSE4/510: Reinforcement Learning

‘-

16

Policy Model Pros Cons Applications

On

Policy

Model

Based

• It finds optimal

policies in

polynomial time for

most cases

• Guaranteed to find

optimal policy

• Requires the

knowledge of the

transition probability

this is an unrealistic

requirement for many

problems

Can be applied

to environment

for which the

state transition

probability is

known

Dynamic Programming

Page 17: TABULAR METHODS & VALUE APPROXIMATION REVIEWavereshc/rl_fall19/lecture_18_Tabular... · 1 TABULAR METHODS & VALUE APPROXIMATION REVIEW Lecture 18 CSE4/510: Reinforcement Learning

‘-

17

Distribution vs Sample Model

1. Distribution model

2. Sample model B. List all possible outcomes and their

probabilities

A. Produce a single outcome taken

according to its probability of occurring

Page 18: TABULAR METHODS & VALUE APPROXIMATION REVIEWavereshc/rl_fall19/lecture_18_Tabular... · 1 TABULAR METHODS & VALUE APPROXIMATION REVIEW Lecture 18 CSE4/510: Reinforcement Learning

‘-

18

Optimal Functions

B

A

C

1

2

3

Page 19: TABULAR METHODS & VALUE APPROXIMATION REVIEWavereshc/rl_fall19/lecture_18_Tabular... · 1 TABULAR METHODS & VALUE APPROXIMATION REVIEW Lecture 18 CSE4/510: Reinforcement Learning

‘-

19

Optimal Functions

B

A

C

1

2

3

Page 20: TABULAR METHODS & VALUE APPROXIMATION REVIEWavereshc/rl_fall19/lecture_18_Tabular... · 1 TABULAR METHODS & VALUE APPROXIMATION REVIEW Lecture 18 CSE4/510: Reinforcement Learning

‘-

20

Update Functions

1. Dynamic Programming

2. Monte Carlo

4. Temporal Difference

A.

B.

C.3. Q-learning

D.

Page 21: TABULAR METHODS & VALUE APPROXIMATION REVIEWavereshc/rl_fall19/lecture_18_Tabular... · 1 TABULAR METHODS & VALUE APPROXIMATION REVIEW Lecture 18 CSE4/510: Reinforcement Learning

‘-

21

Update Functions

1. Dynamic Programming

2. Monte Carlo

4. Temporal Difference A.

B.

C.

3. Q-learning D.

Page 22: TABULAR METHODS & VALUE APPROXIMATION REVIEWavereshc/rl_fall19/lecture_18_Tabular... · 1 TABULAR METHODS & VALUE APPROXIMATION REVIEW Lecture 18 CSE4/510: Reinforcement Learning

‘-

22

Overview

MC / DP / TD ?

Page 23: TABULAR METHODS & VALUE APPROXIMATION REVIEWavereshc/rl_fall19/lecture_18_Tabular... · 1 TABULAR METHODS & VALUE APPROXIMATION REVIEW Lecture 18 CSE4/510: Reinforcement Learning

‘-

23

Monte Carlo

Page 24: TABULAR METHODS & VALUE APPROXIMATION REVIEWavereshc/rl_fall19/lecture_18_Tabular... · 1 TABULAR METHODS & VALUE APPROXIMATION REVIEW Lecture 18 CSE4/510: Reinforcement Learning

‘-

24

Monte Carlo

Page 25: TABULAR METHODS & VALUE APPROXIMATION REVIEWavereshc/rl_fall19/lecture_18_Tabular... · 1 TABULAR METHODS & VALUE APPROXIMATION REVIEW Lecture 18 CSE4/510: Reinforcement Learning

‘-

25

Monte Carlo

Page 26: TABULAR METHODS & VALUE APPROXIMATION REVIEWavereshc/rl_fall19/lecture_18_Tabular... · 1 TABULAR METHODS & VALUE APPROXIMATION REVIEW Lecture 18 CSE4/510: Reinforcement Learning

‘-

26

Policy Model Pros Cons Applications

Monte Carlo

Page 27: TABULAR METHODS & VALUE APPROXIMATION REVIEWavereshc/rl_fall19/lecture_18_Tabular... · 1 TABULAR METHODS & VALUE APPROXIMATION REVIEW Lecture 18 CSE4/510: Reinforcement Learning

‘-

27

Policy Model Pros Cons Applications

On

Policy

Model

Free

• Learn optimal

behavior directly

from interaction with

the environment

• Can be used to

focus on the region

of special interest

and be accurately

evaluated

• Must have the

terminal state

• Must wait until the

end of an episode

before return is

known. For problems

with very long

episodes this will

become too slow

It couldn’t be

used on

continues task,

should be

episodic

Monte Carlo

Page 28: TABULAR METHODS & VALUE APPROXIMATION REVIEWavereshc/rl_fall19/lecture_18_Tabular... · 1 TABULAR METHODS & VALUE APPROXIMATION REVIEW Lecture 18 CSE4/510: Reinforcement Learning

‘-

28

SARSA vs Q-learning

Page 29: TABULAR METHODS & VALUE APPROXIMATION REVIEWavereshc/rl_fall19/lecture_18_Tabular... · 1 TABULAR METHODS & VALUE APPROXIMATION REVIEW Lecture 18 CSE4/510: Reinforcement Learning

‘-

29

SARSA

Page 30: TABULAR METHODS & VALUE APPROXIMATION REVIEWavereshc/rl_fall19/lecture_18_Tabular... · 1 TABULAR METHODS & VALUE APPROXIMATION REVIEW Lecture 18 CSE4/510: Reinforcement Learning

‘-

30

Policy Model Pros Cons Applications

SARSA

Page 31: TABULAR METHODS & VALUE APPROXIMATION REVIEWavereshc/rl_fall19/lecture_18_Tabular... · 1 TABULAR METHODS & VALUE APPROXIMATION REVIEW Lecture 18 CSE4/510: Reinforcement Learning

‘-

31

Q-learning

Page 32: TABULAR METHODS & VALUE APPROXIMATION REVIEWavereshc/rl_fall19/lecture_18_Tabular... · 1 TABULAR METHODS & VALUE APPROXIMATION REVIEW Lecture 18 CSE4/510: Reinforcement Learning

‘-

32

Q-Learning

Policy Model Pros Cons Applications

Page 33: TABULAR METHODS & VALUE APPROXIMATION REVIEWavereshc/rl_fall19/lecture_18_Tabular... · 1 TABULAR METHODS & VALUE APPROXIMATION REVIEW Lecture 18 CSE4/510: Reinforcement Learning

‘-

33

Q-Learning

Policy Model Pros Cons Applications

Off

Policy

Model

Free

Easy to implement • Memory

requirement

increases with

number of states

• Does not

perform well in

stochastic

environment

Environment

with limited

number of

states and

discrete

action spaces

Page 34: TABULAR METHODS & VALUE APPROXIMATION REVIEWavereshc/rl_fall19/lecture_18_Tabular... · 1 TABULAR METHODS & VALUE APPROXIMATION REVIEW Lecture 18 CSE4/510: Reinforcement Learning

‘-

34

Function Approximation

Page 35: TABULAR METHODS & VALUE APPROXIMATION REVIEWavereshc/rl_fall19/lecture_18_Tabular... · 1 TABULAR METHODS & VALUE APPROXIMATION REVIEW Lecture 18 CSE4/510: Reinforcement Learning

‘-

35

Function Approximation

Page 36: TABULAR METHODS & VALUE APPROXIMATION REVIEWavereshc/rl_fall19/lecture_18_Tabular... · 1 TABULAR METHODS & VALUE APPROXIMATION REVIEW Lecture 18 CSE4/510: Reinforcement Learning

‘-

36

Function Approximation

Page 37: TABULAR METHODS & VALUE APPROXIMATION REVIEWavereshc/rl_fall19/lecture_18_Tabular... · 1 TABULAR METHODS & VALUE APPROXIMATION REVIEW Lecture 18 CSE4/510: Reinforcement Learning

‘-

37

Function Approximation

Page 38: TABULAR METHODS & VALUE APPROXIMATION REVIEWavereshc/rl_fall19/lecture_18_Tabular... · 1 TABULAR METHODS & VALUE APPROXIMATION REVIEW Lecture 18 CSE4/510: Reinforcement Learning

‘-

38

Function Approximation

Page 39: TABULAR METHODS & VALUE APPROXIMATION REVIEWavereshc/rl_fall19/lecture_18_Tabular... · 1 TABULAR METHODS & VALUE APPROXIMATION REVIEW Lecture 18 CSE4/510: Reinforcement Learning

‘-

39

Function Approximation

Page 40: TABULAR METHODS & VALUE APPROXIMATION REVIEWavereshc/rl_fall19/lecture_18_Tabular... · 1 TABULAR METHODS & VALUE APPROXIMATION REVIEW Lecture 18 CSE4/510: Reinforcement Learning

‘-

40

Deep Q-network

Page 41: TABULAR METHODS & VALUE APPROXIMATION REVIEWavereshc/rl_fall19/lecture_18_Tabular... · 1 TABULAR METHODS & VALUE APPROXIMATION REVIEW Lecture 18 CSE4/510: Reinforcement Learning

‘-

41

Deep Q-network

Page 42: TABULAR METHODS & VALUE APPROXIMATION REVIEWavereshc/rl_fall19/lecture_18_Tabular... · 1 TABULAR METHODS & VALUE APPROXIMATION REVIEW Lecture 18 CSE4/510: Reinforcement Learning

‘-

42

Deep Q-network

Page 43: TABULAR METHODS & VALUE APPROXIMATION REVIEWavereshc/rl_fall19/lecture_18_Tabular... · 1 TABULAR METHODS & VALUE APPROXIMATION REVIEW Lecture 18 CSE4/510: Reinforcement Learning

‘-

43

Deep Q-network (DQN)

Policy Model Pros Cons Applications

Page 44: TABULAR METHODS & VALUE APPROXIMATION REVIEWavereshc/rl_fall19/lecture_18_Tabular... · 1 TABULAR METHODS & VALUE APPROXIMATION REVIEW Lecture 18 CSE4/510: Reinforcement Learning

‘-

44

Deep Q-network (DQN)

Policy Model Pros Cons Applications

Off

Policy

Model

Free

• Can generalize

to unseen states

• Input is just a

state

• It may over-

estimate value

• Cannot be

applicable to

continuous action

spaces

Environment

with limited

number of

states and

discrete

action spaces

Page 45: TABULAR METHODS & VALUE APPROXIMATION REVIEWavereshc/rl_fall19/lecture_18_Tabular... · 1 TABULAR METHODS & VALUE APPROXIMATION REVIEW Lecture 18 CSE4/510: Reinforcement Learning

‘-

45

Double Deep Q-network

Page 46: TABULAR METHODS & VALUE APPROXIMATION REVIEWavereshc/rl_fall19/lecture_18_Tabular... · 1 TABULAR METHODS & VALUE APPROXIMATION REVIEW Lecture 18 CSE4/510: Reinforcement Learning

‘-

46

Double Deep Q-network

Page 47: TABULAR METHODS & VALUE APPROXIMATION REVIEWavereshc/rl_fall19/lecture_18_Tabular... · 1 TABULAR METHODS & VALUE APPROXIMATION REVIEW Lecture 18 CSE4/510: Reinforcement Learning

‘-

47

Double Deep Q-network

Page 48: TABULAR METHODS & VALUE APPROXIMATION REVIEWavereshc/rl_fall19/lecture_18_Tabular... · 1 TABULAR METHODS & VALUE APPROXIMATION REVIEW Lecture 18 CSE4/510: Reinforcement Learning

‘-

48

Double Deep Q-network

Page 49: TABULAR METHODS & VALUE APPROXIMATION REVIEWavereshc/rl_fall19/lecture_18_Tabular... · 1 TABULAR METHODS & VALUE APPROXIMATION REVIEW Lecture 18 CSE4/510: Reinforcement Learning

‘-

49

Double Deep Q-network (DDQN)

Policy Model Pros Cons Applications

Page 50: TABULAR METHODS & VALUE APPROXIMATION REVIEWavereshc/rl_fall19/lecture_18_Tabular... · 1 TABULAR METHODS & VALUE APPROXIMATION REVIEW Lecture 18 CSE4/510: Reinforcement Learning

‘-

50

Double Deep Q-network (DDQN)

Policy Model Pros Cons Applications

Off

Policy

Model

Free

• Value estimation

is more accurate

comparing to

DQN

• Input is just a

state

• It may take longer

to train

Environment

with limited

number of

states and

discrete

action spaces

Page 51: TABULAR METHODS & VALUE APPROXIMATION REVIEWavereshc/rl_fall19/lecture_18_Tabular... · 1 TABULAR METHODS & VALUE APPROXIMATION REVIEW Lecture 18 CSE4/510: Reinforcement Learning

‘-

51

Dueling Deep Q-network (Dueling DQN)

Page 52: TABULAR METHODS & VALUE APPROXIMATION REVIEWavereshc/rl_fall19/lecture_18_Tabular... · 1 TABULAR METHODS & VALUE APPROXIMATION REVIEW Lecture 18 CSE4/510: Reinforcement Learning

‘-

52

Dueling Deep Q-network (Dueling DQN)

Page 53: TABULAR METHODS & VALUE APPROXIMATION REVIEWavereshc/rl_fall19/lecture_18_Tabular... · 1 TABULAR METHODS & VALUE APPROXIMATION REVIEW Lecture 18 CSE4/510: Reinforcement Learning

‘-

53

Dueling Deep Q-network (Dueling DQN)

Page 54: TABULAR METHODS & VALUE APPROXIMATION REVIEWavereshc/rl_fall19/lecture_18_Tabular... · 1 TABULAR METHODS & VALUE APPROXIMATION REVIEW Lecture 18 CSE4/510: Reinforcement Learning

‘-

54

Dueling Deep Q-network (Dueling DQN)

Page 55: TABULAR METHODS & VALUE APPROXIMATION REVIEWavereshc/rl_fall19/lecture_18_Tabular... · 1 TABULAR METHODS & VALUE APPROXIMATION REVIEW Lecture 18 CSE4/510: Reinforcement Learning

‘-

55

Dueling Deep Q-network (Dueling DQN)

Page 56: TABULAR METHODS & VALUE APPROXIMATION REVIEWavereshc/rl_fall19/lecture_18_Tabular... · 1 TABULAR METHODS & VALUE APPROXIMATION REVIEW Lecture 18 CSE4/510: Reinforcement Learning

‘-

56

Dueling Deep Q-network (Dueling DQN)

Policy Model Pros Cons Applications

Page 57: TABULAR METHODS & VALUE APPROXIMATION REVIEWavereshc/rl_fall19/lecture_18_Tabular... · 1 TABULAR METHODS & VALUE APPROXIMATION REVIEW Lecture 18 CSE4/510: Reinforcement Learning

‘-

57

Prioritized Experience Replay (RER)

Page 58: TABULAR METHODS & VALUE APPROXIMATION REVIEWavereshc/rl_fall19/lecture_18_Tabular... · 1 TABULAR METHODS & VALUE APPROXIMATION REVIEW Lecture 18 CSE4/510: Reinforcement Learning

‘-

58

PER