Q&A Session Reinforcement Learning for Mobile Computing

Copyright 2000-2021 Networking Laboratory

Sungkyunkwan University

Q&A Session

Reinforcement Learning for Mobile Computing

Mobile Computing


Prepared by D-T. Le and H. Choo

[email protected]

mailto:[email protected]

Mobile Computing Networking Laboratory 2/14


Question 1

Reinforcement learning agents/models seek to

a) Explore an environment

b) Maximize a given reward

c) Interpret the state of an environment

d) Exhibit creativity



Question 2

In an MDP, the state and reward at time 𝑡 depend on which of the following?

a) Cumulative reward at time 𝑡

b) Agent dynamics

c) State-action pair for time (𝑡 − 1)

d) State-action pair for all time instances before 𝑡

MDP of a robot trying to walk

: slow action

: fast action



Question 3

What are the main sources of randomness in RL?

a) Random action given state

b) Random reward given state and action

c) Random of cumulative reward given policy and MDP

d) Random next state given state and action



Question 4

How is a model-free RL algorithm different from a model-based one?

a) Model-free algorithms do not rely on knowing environment dynamics: 𝑃 𝑠′ 𝑠, 𝑎

b) Model-based algorithms rely on machine learning models

c) Model-based algorithms know rewards in advance for every state and action

d) Model-free algorithms know all environment states in advance



Question 5

In RL for continuing tasks, how does the total reward become finite even though

the series of rewards is infinite?

a) The agent maximizes the total reward in the same manner as it does for a finite

series of tasks (episodic task)

b) The agent maximizes only the immediate reward

c) The agent maximizes the total reward only of the first 𝑛 rewards in the infinite

series where 𝑛 is an arbitrarily chosen number

d) The sum of the infinite series of discounted rewards converges to a number



Question 6

What does the reward discounting mean for the agent point of view?

a) It reduces the bias of selecting an action by increasing the contribution of close

rewards

b) It focuses agent’s attention more on close rewards and reduce the value of distant

ones

c) It focuses agent’s attention more on distant rewards and reduce the value of close

ones

d) It reduces the variance of selecting an action by decreasing the contribution of

distant rewards



Question 7

What is the difference between a value function and a Q-value function for an

agent following a given policy?

a) The value function tells us how good a given state is for the agent, whereas Q-

value function tells us how good it is for the agent to take an action from a state.

b) Both functions are the same.

c) The value function tells us the state as well as the action, whereas the Q-value

function tells us only the state for any action.

d) The value function tells us the policy, whereas the Q-value function tells us the

action and state.



Question 8

By setting the learning rate to 1 in the following equation, _______________.

𝑄 𝑠𝑡 , 𝑎𝑡 = 1 − 𝛼 𝑄 𝑠𝑡 , 𝑎𝑡 + 𝛼 𝑟𝑡 + 𝛾max𝑎𝑡+1

𝑄 𝑠𝑡+1, 𝑎𝑡+1

a) the agent won't consider previous experiences in calculating the Q-value for a

given state-action pair

b) the updating process will be very slow as there is no previous experience used

c) the agent will always get low rewards due to the discount factor in the next steps

d) the new Q-value will always be the same



Question 9

Which of the following is true?

a) Q-learning is a reinforcement learning technique

b) A DQN combines Q-learning and deep neural networks

c) Q-learning is an on-policy method

d) Training a DQN involves estimating Q-values by updating the weights of the

neural network



Question 10

What is the strategy that DQN agent uses to select actions?

a) Gamma greedy strategy

b) Epsilon generous strategy

c) Epsilon greedy strategy

d) Gamma generous strategy



Question 11

The only function in the Agent class is _______________.

a) take_step()

b) select_action()

c) sample()

d) choose_strategy()



Question 12

In the DQN with experience replay approach (page #40), Q-values for current

states are computed using the _______________.

a) Target Neural Network

b) Policy Neural Network

c) Discount rate

d) Exploration rate



Question 13

During DQN training (page #40), we make a forward pass through the Target

Neural Network. What is the purpose of the pass?

a) To calculate the Q-value for the current action

b) To calculate the target Q-value for the current action

c) To calculate the max Q-value for the next state across all possible next actions

d) To calculate the Q-value for the current state and action

Copyright 2000-2021 Networking Laboratory


Thanks to contributors

Dr. Duc-Tai Le

Prof. Hyunseung Choo

Documents

Q&A Session Reinforcement Learning for Mobile Computing