Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Copyright 2000-2021 Networking Laboratory
Sungkyunkwan University
Q&A Session
Reinforcement Learning for Mobile Computing
Mobile Computing
Sungkyunkwan University
Prepared by D-T. Le and H. Choo
Mobile Computing Networking Laboratory 2/14
Reinforcement Learning for Mobile Computing
Question 1
Reinforcement learning agents/models seek to
a) Explore an environment
b) Maximize a given reward
c) Interpret the state of an environment
d) Exhibit creativity
Mobile Computing Networking Laboratory 3/14
Reinforcement Learning for Mobile Computing
Question 2
In an MDP, the state and reward at time 𝑡 depend on which of the following?
a) Cumulative reward at time 𝑡
b) Agent dynamics
c) State-action pair for time (𝑡 − 1)
d) State-action pair for all time instances before 𝑡
MDP of a robot trying to walk
: slow action
: fast action
Mobile Computing Networking Laboratory 4/14
Reinforcement Learning for Mobile Computing
Question 3
What are the main sources of randomness in RL?
a) Random action given state
b) Random reward given state and action
c) Random of cumulative reward given policy and MDP
d) Random next state given state and action
Mobile Computing Networking Laboratory 5/14
Reinforcement Learning for Mobile Computing
Question 4
How is a model-free RL algorithm different from a model-based one?
a) Model-free algorithms do not rely on knowing environment dynamics: 𝑃 𝑠′ 𝑠, 𝑎
b) Model-based algorithms rely on machine learning models
c) Model-based algorithms know rewards in advance for every state and action
d) Model-free algorithms know all environment states in advance
Mobile Computing Networking Laboratory 6/14
Reinforcement Learning for Mobile Computing
Question 5
In RL for continuing tasks, how does the total reward become finite even though
the series of rewards is infinite?
a) The agent maximizes the total reward in the same manner as it does for a finite
series of tasks (episodic task)
b) The agent maximizes only the immediate reward
c) The agent maximizes the total reward only of the first 𝑛 rewards in the infinite
series where 𝑛 is an arbitrarily chosen number
d) The sum of the infinite series of discounted rewards converges to a number
Mobile Computing Networking Laboratory 7/14
Reinforcement Learning for Mobile Computing
Question 6
What does the reward discounting mean for the agent point of view?
a) It reduces the bias of selecting an action by increasing the contribution of close
rewards
b) It focuses agent’s attention more on close rewards and reduce the value of distant
ones
c) It focuses agent’s attention more on distant rewards and reduce the value of close
ones
d) It reduces the variance of selecting an action by decreasing the contribution of
distant rewards
Mobile Computing Networking Laboratory 8/14
Reinforcement Learning for Mobile Computing
Question 7
What is the difference between a value function and a Q-value function for an
agent following a given policy?
a) The value function tells us how good a given state is for the agent, whereas Q-
value function tells us how good it is for the agent to take an action from a state.
b) Both functions are the same.
c) The value function tells us the state as well as the action, whereas the Q-value
function tells us only the state for any action.
d) The value function tells us the policy, whereas the Q-value function tells us the
action and state.
Mobile Computing Networking Laboratory 9/14
Reinforcement Learning for Mobile Computing
Question 8
By setting the learning rate to 1 in the following equation, _______________.
𝑄 𝑠𝑡 , 𝑎𝑡 = 1 − 𝛼 𝑄 𝑠𝑡 , 𝑎𝑡 + 𝛼 𝑟𝑡 + 𝛾max𝑎𝑡+1
𝑄 𝑠𝑡+1, 𝑎𝑡+1
a) the agent won't consider previous experiences in calculating the Q-value for a
given state-action pair
b) the updating process will be very slow as there is no previous experience used
c) the agent will always get low rewards due to the discount factor in the next steps
d) the new Q-value will always be the same
Mobile Computing Networking Laboratory 10/14
Reinforcement Learning for Mobile Computing
Question 9
Which of the following is true?
a) Q-learning is a reinforcement learning technique
b) A DQN combines Q-learning and deep neural networks
c) Q-learning is an on-policy method
d) Training a DQN involves estimating Q-values by updating the weights of the
neural network
Mobile Computing Networking Laboratory 11/14
Reinforcement Learning for Mobile Computing
Question 10
What is the strategy that DQN agent uses to select actions?
a) Gamma greedy strategy
b) Epsilon generous strategy
c) Epsilon greedy strategy
d) Gamma generous strategy
Mobile Computing Networking Laboratory 12/14
Reinforcement Learning for Mobile Computing
Question 11
The only function in the Agent class is _______________.
a) take_step()
b) select_action()
c) sample()
d) choose_strategy()
Mobile Computing Networking Laboratory 13/14
Reinforcement Learning for Mobile Computing
Question 12
In the DQN with experience replay approach (page #40), Q-values for current
states are computed using the _______________.
a) Target Neural Network
b) Policy Neural Network
c) Discount rate
d) Exploration rate
Mobile Computing Networking Laboratory 14/14
Reinforcement Learning for Mobile Computing
Question 13
During DQN training (page #40), we make a forward pass through the Target
Neural Network. What is the purpose of the pass?
a) To calculate the Q-value for the current action
b) To calculate the target Q-value for the current action
c) To calculate the max Q-value for the next state across all possible next actions
d) To calculate the Q-value for the current state and action