15
Copyright 2000-2021 Networking Laboratory Sungkyunkwan University Q&A Session Reinforcement Learning for Mobile Computing Mobile Computing Sungkyunkwan University Prepared by D-T. Le and H. Choo [email protected]

Q&A Session Reinforcement Learning for Mobile Computing

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Copyright 2000-2021 Networking Laboratory

Sungkyunkwan University

Q&A Session

Reinforcement Learning for Mobile Computing

Mobile Computing

Sungkyunkwan University

Prepared by D-T. Le and H. Choo

[email protected]

Mobile Computing Networking Laboratory 2/14

Reinforcement Learning for Mobile Computing

Question 1

Reinforcement learning agents/models seek to

a) Explore an environment

b) Maximize a given reward

c) Interpret the state of an environment

d) Exhibit creativity

Mobile Computing Networking Laboratory 3/14

Reinforcement Learning for Mobile Computing

Question 2

In an MDP, the state and reward at time 𝑡 depend on which of the following?

a) Cumulative reward at time 𝑡

b) Agent dynamics

c) State-action pair for time (𝑡 − 1)

d) State-action pair for all time instances before 𝑡

MDP of a robot trying to walk

: slow action

: fast action

Mobile Computing Networking Laboratory 4/14

Reinforcement Learning for Mobile Computing

Question 3

What are the main sources of randomness in RL?

a) Random action given state

b) Random reward given state and action

c) Random of cumulative reward given policy and MDP

d) Random next state given state and action

Mobile Computing Networking Laboratory 5/14

Reinforcement Learning for Mobile Computing

Question 4

How is a model-free RL algorithm different from a model-based one?

a) Model-free algorithms do not rely on knowing environment dynamics: 𝑃 𝑠′ 𝑠, 𝑎

b) Model-based algorithms rely on machine learning models

c) Model-based algorithms know rewards in advance for every state and action

d) Model-free algorithms know all environment states in advance

Mobile Computing Networking Laboratory 6/14

Reinforcement Learning for Mobile Computing

Question 5

In RL for continuing tasks, how does the total reward become finite even though

the series of rewards is infinite?

a) The agent maximizes the total reward in the same manner as it does for a finite

series of tasks (episodic task)

b) The agent maximizes only the immediate reward

c) The agent maximizes the total reward only of the first 𝑛 rewards in the infinite

series where 𝑛 is an arbitrarily chosen number

d) The sum of the infinite series of discounted rewards converges to a number

Mobile Computing Networking Laboratory 7/14

Reinforcement Learning for Mobile Computing

Question 6

What does the reward discounting mean for the agent point of view?

a) It reduces the bias of selecting an action by increasing the contribution of close

rewards

b) It focuses agent’s attention more on close rewards and reduce the value of distant

ones

c) It focuses agent’s attention more on distant rewards and reduce the value of close

ones

d) It reduces the variance of selecting an action by decreasing the contribution of

distant rewards

Mobile Computing Networking Laboratory 8/14

Reinforcement Learning for Mobile Computing

Question 7

What is the difference between a value function and a Q-value function for an

agent following a given policy?

a) The value function tells us how good a given state is for the agent, whereas Q-

value function tells us how good it is for the agent to take an action from a state.

b) Both functions are the same.

c) The value function tells us the state as well as the action, whereas the Q-value

function tells us only the state for any action.

d) The value function tells us the policy, whereas the Q-value function tells us the

action and state.

Mobile Computing Networking Laboratory 9/14

Reinforcement Learning for Mobile Computing

Question 8

By setting the learning rate to 1 in the following equation, _______________.

𝑄 𝑠𝑡 , 𝑎𝑡 = 1 − 𝛼 𝑄 𝑠𝑡 , 𝑎𝑡 + 𝛼 𝑟𝑡 + 𝛾max𝑎𝑡+1

𝑄 𝑠𝑡+1, 𝑎𝑡+1

a) the agent won't consider previous experiences in calculating the Q-value for a

given state-action pair

b) the updating process will be very slow as there is no previous experience used

c) the agent will always get low rewards due to the discount factor in the next steps

d) the new Q-value will always be the same

Mobile Computing Networking Laboratory 10/14

Reinforcement Learning for Mobile Computing

Question 9

Which of the following is true?

a) Q-learning is a reinforcement learning technique

b) A DQN combines Q-learning and deep neural networks

c) Q-learning is an on-policy method

d) Training a DQN involves estimating Q-values by updating the weights of the

neural network

Mobile Computing Networking Laboratory 11/14

Reinforcement Learning for Mobile Computing

Question 10

What is the strategy that DQN agent uses to select actions?

a) Gamma greedy strategy

b) Epsilon generous strategy

c) Epsilon greedy strategy

d) Gamma generous strategy

Mobile Computing Networking Laboratory 12/14

Reinforcement Learning for Mobile Computing

Question 11

The only function in the Agent class is _______________.

a) take_step()

b) select_action()

c) sample()

d) choose_strategy()

Mobile Computing Networking Laboratory 13/14

Reinforcement Learning for Mobile Computing

Question 12

In the DQN with experience replay approach (page #40), Q-values for current

states are computed using the _______________.

a) Target Neural Network

b) Policy Neural Network

c) Discount rate

d) Exploration rate

Mobile Computing Networking Laboratory 14/14

Reinforcement Learning for Mobile Computing

Question 13

During DQN training (page #40), we make a forward pass through the Target

Neural Network. What is the purpose of the pass?

a) To calculate the Q-value for the current action

b) To calculate the target Q-value for the current action

c) To calculate the max Q-value for the next state across all possible next actions

d) To calculate the Q-value for the current state and action

Copyright 2000-2021 Networking Laboratory

Sungkyunkwan University

Thanks to contributors

Dr. Duc-Tai Le

Prof. Hyunseung Choo