Reinforcement Learning 2 - courses.cit.cornell.edu · Reinforcement Learning 2 Pantelis P. Analytis...

ReinforcementLearning 2

Pantelis P.Analytis

Introduction

Temporaldifferencelearning

Q-learning

Applications

Midtermrevision

Reinforcement Learning 2

Pantelis P. Analytis

March 24, 2018

1 / 25

Pantelis P.Analytis

Introduction

Q-learning

Applications

Midtermrevision

1 Introduction

2 Temporal difference learning

3 Q-learning

4 Applications

5 Midterm revision

2 / 25

Pantelis P.Analytis

Introduction

Q-learning

Applications

Midtermrevision

Different types of learning

3 / 25

Pantelis P.Analytis

Introduction

Q-learning

Applications

Midtermrevision

Characteristics of reinforcement learning

Evaluative feedback.

Sequentiality, delayed rewards.

Need for trial and error, to explore as well as to exploit.

Non stationary world.

4 / 25

Pantelis P.Analytis

Introduction

Q-learning

Applications

Midtermrevision

Temporal difference learning

Broadly used to predict future rewards.It appears to be how the brain reward system works.It is learning a prediction from another later, learnedprediction.The TD error is the difference between two predictions,the temporal difference.

5 / 25

Pantelis P.Analytis

Introduction

Q-learning

Applications

Midtermrevision

V (s)← V (s) + α(

The TD target︷︸︸︷r + γV (s ′) −V (s))

r + γV (s ′) is known as the TD target

6 / 25

Pantelis P.Analytis

Introduction

Q-learning

Applications

Midtermrevision

Temporal difference learning in the brain (Schultz,Dayan, Montague, 1997)

7 / 25

Pantelis P.Analytis

Introduction

Q-learning

Applications

Midtermrevision

Temporal difference learning in the brain

V (s)← V (s) + α(

The TD target︷︸︸︷r + γV (s ′) −V (s))

r + γV (s ′) is known as the TD target8 / 25

Pantelis P.Analytis

Introduction

Q-learning

Applications

Midtermrevision

Temporal difference learning: example

Predicting the outcome of a game like chess orbackgammon.Long-term predictions by simulation are complex and evensmall errors in one-step predictions might be amplified. 9 / 25

Pantelis P.Analytis

Introduction

Q-learning

Applications

Midtermrevision

Temporal difference learning: example

Predicting the outcome of a game like chess orbackgammon.Long-term predictions by simulation are complex and evensmall errors in one-step predictions might be amplified. 10 / 25

Pantelis P.Analytis

Introduction

Q-learning

Applications

Midtermrevision

11 / 25

Pantelis P.Analytis

Introduction

Q-learning

Applications

Midtermrevision

12 / 25

Pantelis P.Analytis

Introduction

Q-learning

Applications

Midtermrevision

Q-learning

13 / 25

Pantelis P.Analytis

Introduction

Q-learning

Applications

Midtermrevision

Q-learning

Q-learning converges to the optimal even if you are actingsub-optimally.

14 / 25

Pantelis P.Analytis

Introduction

Q-learning

Applications

Midtermrevision

Model based and model free learning

Many situations involve conflict between a model-freesystem like TD-learning and a model-based system thatplans ahead.

15 / 25

Pantelis P.Analytis

Introduction

Q-learning

Applications

Midtermrevision

Samuel’s checkers program

Inspired by Shannon’s paper on chess-playing computers.

It achieved good, but not expert level of playing.

Used a learning process that was similar to TD-learning.

16 / 25

Pantelis P.Analytis

Introduction

Q-learning

Applications

Midtermrevision

Tesauro’s TD-Grammon

Developed in 1992 by Gerard Tesauro. After playing300.000 games against itself it performed approximately atthe level of human world class players.

17 / 25

Pantelis P.Analytis

Introduction

Q-learning

Applications

Midtermrevision

Atari breakthrough

Google brained trained an agent that learned 49 Atarigames by receiving as input the pixels of the screen andevaluated the rewards from different positions of thejoystick. It learned half of them at human level.

18 / 25

Pantelis P.Analytis

Introduction

Q-learning

Applications

Midtermrevision

Alpha Go

Alpha go searched planned much deeper in the game tree.

It uses reinforcement learning to evaluate which pathswhere worthwhile searching.

19 / 25

Pantelis P.Analytis

Introduction

Q-learning

Applications

Midtermrevision

Attention allocation in online interfaces

20 / 25

Pantelis P.Analytis

Introduction

Q-learning

Applications

Midtermrevision

Music lab experiment

21 / 25

Pantelis P.Analytis

Introduction

Q-learning

Applications

Midtermrevision

Learning from others

22 / 25

Pantelis P.Analytis

Introduction

Q-learning

Applications

Midtermrevision

Clinical vs. actuarial decision making

23 / 25

Pantelis P.Analytis

Introduction

Q-learning

Applications

Midtermrevision

Exploration-exploitation dilemma

24 / 25

Pantelis P.Analytis

Introduction

Q-learning

Applications

Midtermrevision

Iowa gambling task

25 / 25

Reinforcement Learning 2 - courses.cit.cornell.edu · Reinforcement Learning 2 Pantelis P. Analytis...

Documents

Reinforcement Learning Lecture Inverse Reinforcement Learningipvs.informatik.uni-stuttgart.de/mlr/wp-content/uploads/2017/07/09... · Reinforcement Learning Inverse Reinforcement

From Reinforcement Learning to Deep Reinforcement …fagostin/assets/files/...Keywords: Machine learning · Reinforcement learning Deep learning · Deep reinforcement learning 1 Introduction

Generalization in Reinforcement Learning: Successful ...papers.nips.cc/paper/1109-generalization-in-reinforcement-learning... · Generalization in Reinforcement Learning: Successful

Hierarchical Deep Reinforcement Learning: Integrating ...papers.nips.cc/paper/6233-hierarchical-deep-reinforcement-learning... · work that integrates deep reinforcement learning

Universal Reinforcement Learning Algorithms: … Reinforcement Learning Algorithms: Survey and ... class of models hypothesesenvironments M . ... Universal Reinforcement Learning Algorithms:

Reinforcement Learning or Active Inference?karl/Reinforcement Learning or Active... · Reinforcement Learning or Active Inference? ... From the point of view of reinforcement learning

Multi-Objective Reinforcement Learning using Sets of Pareto … · 2020. 10. 19. · learning and multi-objective reinforcement learning. 2.1 Reinforcement Learning A reinforcement

DATACTIF SoNetA. BIG DATA ANALYTIS

Eick: Reinforcement Learning. Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning

Deep Learning for Reinforcement Learning in Pacman · Deep Learning for Reinforcement Learning in Pacman Deep Learning für Reinforcement Learning in Pacman Vorgelegte Bachelor-Thesis

Reinforcement Learning & Apprenticeship Learning

Reinforcement Learning - uni-freiburg.degki.informatik.uni-freiburg.de/.../recordings/reinforcement.pdf · Reinforcement Learning 3 What is Reinforcement Learning? Learning from interaction

Reinforcement Learning

Reinforcement Learning: Learning algorithms

Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning Applications Summary

The Reinforcement Learning Toolbox – Reinforcement Learning in Optimal Control Tasks

Reinforcement Learning Chapter 13 What is Reinforcement Learning? Q-Learning Examples 1

Cooperative Inverse Reinforcement Learning...Cooperative Inverse Reinforcement Learning Dylan Hadfield-Menell CS237: Reinforcement Learning May 31, 2017

Inverse Reinforcement Learning - Peoplecbfinn/_files/bootcamp_inverserl.pdf · Apprenticeship Learning via Inverse Reinforcement Learning. Good introduction to inverse reinforcement

Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements