Reinforcement Learning
Emilie Kaufmann([email protected])
Ecole Centrale de Lille, 2019/2020
Emilie Kaufmann |CRIStAL - 1
This Reinforcement Learning (RL) class
I 8 lectures (2 hours each)
I 4 practical sessions (1 on bandits, 3 on RL)
I project presentation morning : January 27th, 2020 (4 hours)
Evaluation : one project (groups of 1 or 2)
I List of projects available on December 10th
Class jointly taught with :
I Olivier Pietquin (Google Brain)
Deep Reinforcement Learning (4 hours)
I Omar Darwiche-Domingues (Inria SequeL)
Practical Sessions of Reinforcement Learning (6 hours)
Emilie Kaufmann |CRIStAL - 2
Useful References
I Some books
I many research papers (references in the slides)
I material from the first RL Summer School :https://rlss.inria.fr/program/
Emilie Kaufmann |CRIStAL - 3
Reinforcement Learning :
Introduction
Emilie Kaufmann |CRIStAL - 4
What is Reinforcement Learning ?
Ü learning by “trial and error”
Ü learning to behave in an unknown, shochastic environement bymaximizing some real-valued reward signal
Example : learning to bike without a perfect knowledge of physics
Emilie Kaufmann |CRIStAL - 5
Key RL concepts
A learning agent sequentially interacts with its environment byperforming actions. Each action
I provides an instantaneous reward
I leads to an evolution of the agent’s state
Agent’s goal : act so as to maximize its total reward
source : Wikipedia
Emilie Kaufmann |CRIStAL - 6
Key RL concepts
Keywords (high-level) :
I Reward : instantaneous feedback received after acting
I Value : total reward the agent can get in some state
I Policy : strategy to choose an action in a given state
Agent’s goal : find a policy that maximizes the value in each state
source : Wikipedia
Emilie Kaufmann |CRIStAL - 7
RL successes : Games (1/2)
From Backgammon...
1992, TD-gammon
... to Go
2015, AlphaGo2017, AlphaGo Zero
Ü RL agents learn new types of strategies
Emilie Kaufmann |CRIStAL - 8
RL successes : Games (2/2)
I Learning to play from pixels (and rewards) : Atari Games2010+ Deep Reinforcement Learning
I Recent challenges : multi-player / partial information games
OpenAI Five (2019) Pluribus (2019)
Emilie Kaufmann |CRIStAL - 9
RL sucessess : Content Optimization
I online advertisement
Ü action : display an add / reward : click
I (sequential) recommender systems
Ü action : recommend a movie / reward : rating
Emilie Kaufmann |CRIStAL - 10
RL : Many potential applications
I Smart grid / microgrid management
source : ScienceDirect.comActions :
I charge or discharge storage systems
I turn on or off renewable energy source
I buy energy from the market ...
Reward : - Cost
Emilie Kaufmann |CRIStAL - 11
RL : Many potential applications
I Autonomous robotics
I Self-driving cars ?
Emilie Kaufmann |CRIStAL - 12
History of RL
• Learning to behave from rewards : an old idea from psychology
I 1900s : observation of animal behavior(e.g. Thorndike 1911 “Law of Effect”)
Of several responses made to the same situation, those which are accompanied
or closely followed by satisfaction to the animal will [...] be more likely to recur.
I 1920s : Pavlov work on conditionnal reflexesfirst occurence of “reinforcement” in animal learning
source : Wikipedia
Emilie Kaufmann |CRIStAL - 13
History of RL
• Learning to behave from rewards : does it happen in the brain ?
I Oak and Miller 1954 : first experiments on electric brain stimuli forcontrolling mice behavior
Ü hypothesis that dopamine broadcast rewards signal to the brain
I Today’s RL Dopamine :)
https://github.com/google/dopamine
Emilie Kaufmann |CRIStAL - 14
History of RL
• Some steps towards computational RL
I 1950s, Shannon’s machines : “Theseus”, a mice finding how to getout of a maze, a chess player, a Rubik’s cube solver
I 1957, Bellmann : Dynamic Programming(control of dynamical systems)
I 1961, Minsky “Towards artificial intelligence”
I 1978, Sutton : Temporal Difference Learning(artificial intelligence)
I 1989, Watkins : Q-Learning algorithm
Nowadays, reinforcement learning is mostly formalized as learning anoptimal policy in an incompletely-known Markov Decision Process.
Emilie Kaufmann |CRIStAL - 15
RL ⊆ ML
RL is also viewed as a sub-field of Machine Learning
3 types of Machine Learning (ML) tasks :
Supervised Learning
Learn to make predictions, based on a large batch of data for which thetarget variable is observed
Unsupervised Learning
Find some latent structure in data (clusters, low-rank structure...)
Reinforcement Learning
Learn to take decisions / influence the data collection process
Emilie Kaufmann |CRIStAL - 16
Outline of the class
• Lecture 1. Markov Decision Processes (MDP), a formalization forreinforcement learning problem(s)
• Lecture 2. One-state, several actions : solving multi-armed banditsUCB algorithms. Thompson Sampling
• Lecture 3. Solving a MDP with known parameters.Dynamic Programming, Value/Policy Iteration
• Lecture 4. First Reinforcement Learning algorithms.TD Learning, Q-Learning
• Lecture 5. Approximate Dynamic Programming
• Lecture 6. Deep Reinforcement Learning (O. Pietquin)
• Lecture 7. Policy Gradient Methods (O. Pietquin)
• Lecture 8. Bandit tools for RLBandit-based exploration, Monte-Carlo Tree Search Methods
Emilie Kaufmann |CRIStAL - 17