Upload
others
View
9
Download
0
Embed Size (px)
Citation preview
‘-
1
TABULAR METHODS & VALUE APPROXIMATION REVIEWLecture 18
CSE4/510: Reinforcement Learning
October 24, 2019
‘-
2
MDP
‘-
3
MDP
‘-
4
MDP
‘-
5
MDP
‘-
6
MDP
‘-
7
MDP
‘-
8
MDP
‘-
9
MDP
1.
2.
3.
4.
5.
A
B
C
D
E
F
‘-
10
MDP
1.
2.
3.
4.
5.
A
B
C
D
E
F
‘-
11
Dynamic Programming
‘-
12
Dynamic Programming
‘-
13
Dynamic Programming
‘-
14
1. Policy Evaluation
2. Policy Improvement
A
B
Dynamic Programming
‘-
15
‘-
16
Policy Model Pros Cons Applications
On
Policy
Model
Based
• It finds optimal
policies in
polynomial time for
most cases
• Guaranteed to find
optimal policy
• Requires the
knowledge of the
transition probability
this is an unrealistic
requirement for many
problems
Can be applied
to environment
for which the
state transition
probability is
known
Dynamic Programming
‘-
17
Distribution vs Sample Model
1. Distribution model
2. Sample model B. List all possible outcomes and their
probabilities
A. Produce a single outcome taken
according to its probability of occurring
‘-
18
Optimal Functions
B
A
C
1
2
3
‘-
19
Optimal Functions
B
A
C
1
2
3
‘-
20
Update Functions
1. Dynamic Programming
2. Monte Carlo
4. Temporal Difference
A.
B.
C.3. Q-learning
D.
‘-
21
Update Functions
1. Dynamic Programming
2. Monte Carlo
4. Temporal Difference A.
B.
C.
3. Q-learning D.
‘-
22
Overview
MC / DP / TD ?
‘-
23
Monte Carlo
‘-
24
Monte Carlo
‘-
25
Monte Carlo
‘-
26
Policy Model Pros Cons Applications
Monte Carlo
‘-
27
Policy Model Pros Cons Applications
On
Policy
Model
Free
• Learn optimal
behavior directly
from interaction with
the environment
• Can be used to
focus on the region
of special interest
and be accurately
evaluated
• Must have the
terminal state
• Must wait until the
end of an episode
before return is
known. For problems
with very long
episodes this will
become too slow
It couldn’t be
used on
continues task,
should be
episodic
Monte Carlo
‘-
28
SARSA vs Q-learning
‘-
29
SARSA
‘-
30
Policy Model Pros Cons Applications
SARSA
‘-
31
Q-learning
‘-
32
Q-Learning
Policy Model Pros Cons Applications
‘-
33
Q-Learning
Policy Model Pros Cons Applications
Off
Policy
Model
Free
Easy to implement • Memory
requirement
increases with
number of states
• Does not
perform well in
stochastic
environment
Environment
with limited
number of
states and
discrete
action spaces
‘-
34
Function Approximation
‘-
35
Function Approximation
‘-
36
Function Approximation
‘-
37
Function Approximation
‘-
38
Function Approximation
‘-
39
Function Approximation
‘-
40
Deep Q-network
‘-
41
Deep Q-network
‘-
42
Deep Q-network
‘-
43
Deep Q-network (DQN)
Policy Model Pros Cons Applications
‘-
44
Deep Q-network (DQN)
Policy Model Pros Cons Applications
Off
Policy
Model
Free
• Can generalize
to unseen states
• Input is just a
state
• It may over-
estimate value
• Cannot be
applicable to
continuous action
spaces
Environment
with limited
number of
states and
discrete
action spaces
‘-
45
Double Deep Q-network
‘-
46
Double Deep Q-network
‘-
47
Double Deep Q-network
‘-
48
Double Deep Q-network
‘-
49
Double Deep Q-network (DDQN)
Policy Model Pros Cons Applications
‘-
50
Double Deep Q-network (DDQN)
Policy Model Pros Cons Applications
Off
Policy
Model
Free
• Value estimation
is more accurate
comparing to
DQN
• Input is just a
state
• It may take longer
to train
Environment
with limited
number of
states and
discrete
action spaces
‘-
51
Dueling Deep Q-network (Dueling DQN)
‘-
52
Dueling Deep Q-network (Dueling DQN)
‘-
53
Dueling Deep Q-network (Dueling DQN)
‘-
54
Dueling Deep Q-network (Dueling DQN)
‘-
55
Dueling Deep Q-network (Dueling DQN)
‘-
56
Dueling Deep Q-network (Dueling DQN)
Policy Model Pros Cons Applications
‘-
57
Prioritized Experience Replay (RER)
‘-
58
PER