62
Overview of Decision Trees, Ensemble Methods and Reinforcement Learning CMSC 678 UMBC

Decision Trees, Ensembles, Reinforcement Learning · Example: Learning a decision tree ourse ratings dataset Adapted from Hamed Pirsiavash. Questions are features Responses are feature

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Decision Trees, Ensembles, Reinforcement Learning · Example: Learning a decision tree ourse ratings dataset Adapted from Hamed Pirsiavash. Questions are features Responses are feature

Overview of Decision Trees, Ensemble Methods and Reinforcement Learning

CMSC 678

UMBC

Page 2: Decision Trees, Ensembles, Reinforcement Learning · Example: Learning a decision tree ourse ratings dataset Adapted from Hamed Pirsiavash. Questions are features Responses are feature

Outline

Decision Trees

Ensemble Methods

Bagging

Random Forests

Reinforcement Learning

Page 3: Decision Trees, Ensembles, Reinforcement Learning · Example: Learning a decision tree ourse ratings dataset Adapted from Hamed Pirsiavash. Questions are features Responses are feature

“20 Questions”: http://20q.net/

Goals: 1. Figure out what questions to ask

2. In what order

3. Determine how many questions are enough

4. What to predict at the end

Decision Trees

Adapted from Hamed Pirsiavash

Page 4: Decision Trees, Ensembles, Reinforcement Learning · Example: Learning a decision tree ourse ratings dataset Adapted from Hamed Pirsiavash. Questions are features Responses are feature

Example: Learning a decision treeCourse ratings dataset

Adapted from Hamed Pirsiavash

Page 5: Decision Trees, Ensembles, Reinforcement Learning · Example: Learning a decision tree ourse ratings dataset Adapted from Hamed Pirsiavash. Questions are features Responses are feature

Rating is the label

Example: Learning a decision treeCourse ratings dataset

Adapted from Hamed Pirsiavash

Page 6: Decision Trees, Ensembles, Reinforcement Learning · Example: Learning a decision tree ourse ratings dataset Adapted from Hamed Pirsiavash. Questions are features Responses are feature

Questions are features

Rating is the label

Example: Learning a decision treeCourse ratings dataset

Adapted from Hamed Pirsiavash

Page 7: Decision Trees, Ensembles, Reinforcement Learning · Example: Learning a decision tree ourse ratings dataset Adapted from Hamed Pirsiavash. Questions are features Responses are feature

Questions are features

Responses are feature values

Rating is the label

Idea: Predict the label by forming a tree where each node branches on values of

particular features

Example: Learning a decision treeCourse ratings dataset

Adapted from Hamed Pirsiavash

Page 8: Decision Trees, Ensembles, Reinforcement Learning · Example: Learning a decision tree ourse ratings dataset Adapted from Hamed Pirsiavash. Questions are features Responses are feature

Questions are features

Responses are feature values

Rating is the label

Example: Learning a decision treeCourse ratings dataset

Adapted from Hamed Pirsiavash

Easy?

Page 9: Decision Trees, Ensembles, Reinforcement Learning · Example: Learning a decision tree ourse ratings dataset Adapted from Hamed Pirsiavash. Questions are features Responses are feature

Questions are features

Responses are feature values

Rating is the label

Example: Learning a decision treeCourse ratings dataset

Adapted from Hamed Pirsiavash

Easy?

Easy: yes Easy: no

AI?

Page 10: Decision Trees, Ensembles, Reinforcement Learning · Example: Learning a decision tree ourse ratings dataset Adapted from Hamed Pirsiavash. Questions are features Responses are feature

Questions are features

Responses are feature values

Rating is the label

Example: Learning a decision treeCourse ratings dataset

Adapted from Hamed Pirsiavash

Easy?

Easy: yes Easy: no

AI?AI: yes AI: no

….

Page 11: Decision Trees, Ensembles, Reinforcement Learning · Example: Learning a decision tree ourse ratings dataset Adapted from Hamed Pirsiavash. Questions are features Responses are feature

Questions are features

Responses are feature values

Rating is the label

Example: Learning a decision treeCourse ratings dataset

Adapted from Hamed Pirsiavash

Easy?

Easy: yes Easy: no

AI? Sys?AI: yes AI: no

….

Page 12: Decision Trees, Ensembles, Reinforcement Learning · Example: Learning a decision tree ourse ratings dataset Adapted from Hamed Pirsiavash. Questions are features Responses are feature

Questions are features

Responses are feature values

Rating is the label

Example: Learning a decision treeCourse ratings dataset

Adapted from Hamed Pirsiavash

Easy?

Easy: yes Easy: no

AI? Sys?AI: yes AI: no Sys: yes Sys: no

…. ….

Page 13: Decision Trees, Ensembles, Reinforcement Learning · Example: Learning a decision tree ourse ratings dataset Adapted from Hamed Pirsiavash. Questions are features Responses are feature

CIML, Ch 1

Predicting with a Decision Tree is Done Easily and Recursively

Page 14: Decision Trees, Ensembles, Reinforcement Learning · Example: Learning a decision tree ourse ratings dataset Adapted from Hamed Pirsiavash. Questions are features Responses are feature

There Are Many Ways to Learn a Decision Tree

1. Greedy/Count: What is the most accurate feature at each decision point?

1. See CIML Ch. 1 (and next slides)

2. Maximize information gain at each step1. Most popular approaches: ID3, C4.5

3. Account for statistical significance1. Example: Chi-square automatic interaction

detection (CHAID)

4. Other task-specific ones (including clustering based)

Page 15: Decision Trees, Ensembles, Reinforcement Learning · Example: Learning a decision tree ourse ratings dataset Adapted from Hamed Pirsiavash. Questions are features Responses are feature

CIML, Ch 1

Page 16: Decision Trees, Ensembles, Reinforcement Learning · Example: Learning a decision tree ourse ratings dataset Adapted from Hamed Pirsiavash. Questions are features Responses are feature

CIML, Ch 1

counting

Page 17: Decision Trees, Ensembles, Reinforcement Learning · Example: Learning a decision tree ourse ratings dataset Adapted from Hamed Pirsiavash. Questions are features Responses are feature

CIML, Ch 1

counting

recursive

Page 18: Decision Trees, Ensembles, Reinforcement Learning · Example: Learning a decision tree ourse ratings dataset Adapted from Hamed Pirsiavash. Questions are features Responses are feature

CIML, Ch 1

counting

recursive

simple base cases

Page 19: Decision Trees, Ensembles, Reinforcement Learning · Example: Learning a decision tree ourse ratings dataset Adapted from Hamed Pirsiavash. Questions are features Responses are feature

Outline

Decision Trees

Ensemble Methods

Bagging

Random Forests

Reinforcement Learning

Page 20: Decision Trees, Ensembles, Reinforcement Learning · Example: Learning a decision tree ourse ratings dataset Adapted from Hamed Pirsiavash. Questions are features Responses are feature

Key Idea: “Wisdom of the crowd“

groups of people can often make better decisions than individuals

Apply this to ML

Learn multiple classifiers and combine their predictions

Ensembles

Page 21: Decision Trees, Ensembles, Reinforcement Learning · Example: Learning a decision tree ourse ratings dataset Adapted from Hamed Pirsiavash. Questions are features Responses are feature

Train several classifiers and take majority of predictions

Combining Multiple Classifiers by Voting

Courtesy Hamed Pirsiavash

Page 22: Decision Trees, Ensembles, Reinforcement Learning · Example: Learning a decision tree ourse ratings dataset Adapted from Hamed Pirsiavash. Questions are features Responses are feature

Train several classifiers and take majority of predictions

For regression use mean or median of the predictions

For ranking and collective classification use some form of averaging

Combining Multiple Classifiers by Voting

Page 23: Decision Trees, Ensembles, Reinforcement Learning · Example: Learning a decision tree ourse ratings dataset Adapted from Hamed Pirsiavash. Questions are features Responses are feature

Train several classifiers and take majority of predictions

For regression use mean or median of the predictions

For ranking and collective classification use some form of averaging

Combining Multiple Classifiers by Voting

A common family of approaches

is called bagging

Page 24: Decision Trees, Ensembles, Reinforcement Learning · Example: Learning a decision tree ourse ratings dataset Adapted from Hamed Pirsiavash. Questions are features Responses are feature

Bagging: Split the DataQ: What can go wrong

with option 1?Option 1: Split the data into K pieces and train a classifier on each

Page 25: Decision Trees, Ensembles, Reinforcement Learning · Example: Learning a decision tree ourse ratings dataset Adapted from Hamed Pirsiavash. Questions are features Responses are feature

Bagging: Split the DataQ: What can go wrong

with option 1?

A: Small sample →poor performance

Option 1: Split the data into K pieces and train a classifier on each

Page 26: Decision Trees, Ensembles, Reinforcement Learning · Example: Learning a decision tree ourse ratings dataset Adapted from Hamed Pirsiavash. Questions are features Responses are feature

Option 2: Bootstrap aggregation (bagging) resampling

Bagging: Split the DataQ: What can go wrong

with option 1?

A: Small sample →poor performance

Option 1: Split the data into K pieces and train a classifier on each

Page 27: Decision Trees, Ensembles, Reinforcement Learning · Example: Learning a decision tree ourse ratings dataset Adapted from Hamed Pirsiavash. Questions are features Responses are feature

Option 2: Bootstrap aggregation (bagging) resampling

Obtain datasets D1, D2, … , DN

using bootstrap resampling from D

Bagging: Split the Data

sampling with replacement

Q: What can go wrong with option 1?

A: Small sample →poor performance

Option 1: Split the data into K pieces and train a classifier on each

Given a dataset D…

get new datasets D̂ by random sampling with

replacement from DCourtesy Hamed Pirsiavash

Page 28: Decision Trees, Ensembles, Reinforcement Learning · Example: Learning a decision tree ourse ratings dataset Adapted from Hamed Pirsiavash. Questions are features Responses are feature

Option 2: Bootstrap aggregation (bagging) resampling

Obtain datasets D1, D2, … , DN

using bootstrap resampling from D

Train classifiers on each dataset and average their predictions

Bagging: Split the Data

sampling with replacement

Q: What can go wrong with option 1?

A: Small sample →poor performance

Option 1: Split the data into K pieces and train a classifier on each

Given a dataset D…

get new datasets D̂ by random sampling with

replacement from DCourtesy Hamed Pirsiavash

Page 29: Decision Trees, Ensembles, Reinforcement Learning · Example: Learning a decision tree ourse ratings dataset Adapted from Hamed Pirsiavash. Questions are features Responses are feature

Averaging reduces the variance of estimators

Why does averaging work?

Courtesy Hamed Pirsiavash

Page 30: Decision Trees, Ensembles, Reinforcement Learning · Example: Learning a decision tree ourse ratings dataset Adapted from Hamed Pirsiavash. Questions are features Responses are feature

Averaging reduces the variance of estimators

Why does averaging work?

Courtesy Hamed Pirsiavash

y: observed data

f: Generating line

𝑔𝑖: Learned polynomial regression

Page 31: Decision Trees, Ensembles, Reinforcement Learning · Example: Learning a decision tree ourse ratings dataset Adapted from Hamed Pirsiavash. Questions are features Responses are feature

Averaging reduces the variance of estimators

Averaging is a form of regularization: each model can individually overfit but the average is able to overcome the overfitting

Why does averaging work?

50 samples

Courtesy Hamed Pirsiavash

Page 32: Decision Trees, Ensembles, Reinforcement Learning · Example: Learning a decision tree ourse ratings dataset Adapted from Hamed Pirsiavash. Questions are features Responses are feature

Bagging Decision Trees

How would it work?

Page 33: Decision Trees, Ensembles, Reinforcement Learning · Example: Learning a decision tree ourse ratings dataset Adapted from Hamed Pirsiavash. Questions are features Responses are feature

Bagging Decision Trees

How would it work?

Bootstrap sample S samples {(X1, Y1), …, (XS, YS)}

Train a tree ts on (Xs, Ys)

At test time: ො𝑦 = avg(𝑡1 𝑥 ,… 𝑡𝑆 𝑥 )

Page 34: Decision Trees, Ensembles, Reinforcement Learning · Example: Learning a decision tree ourse ratings dataset Adapted from Hamed Pirsiavash. Questions are features Responses are feature

Bagging trees with one modification

At each split point, choose a random subset of features of size k and pick the best among these

Train decision trees of depth d

Average results from multiple randomly trained trees

Random Forests

Q: What’s the difference between bagging decision trees and random forests?

Courtesy Hamed Pirsiavash

Page 35: Decision Trees, Ensembles, Reinforcement Learning · Example: Learning a decision tree ourse ratings dataset Adapted from Hamed Pirsiavash. Questions are features Responses are feature

Bagging trees with one modification

At each split point, choose a random subset of features of size k and pick the best among these

Train decision trees of depth d

Average results from multiple randomly trained trees

Random Forests

Q: What’s the difference between bagging decision trees and random forests?

Courtesy Hamed Pirsiavash

A: Bagging → highly correlated trees (reuse good

features)

Page 36: Decision Trees, Ensembles, Reinforcement Learning · Example: Learning a decision tree ourse ratings dataset Adapted from Hamed Pirsiavash. Questions are features Responses are feature

Random Forests: Human Pose Estimation (Shotton et al., CVPR 2011)

Training: 3 trees, 20 deep, 300k

training images per tree, 2000 training

example pixels per image, 2000

candidate features θ, and 50 candidate

thresholds τ per feature (Takes about 1

day on a 1000 core cluster)

Page 37: Decision Trees, Ensembles, Reinforcement Learning · Example: Learning a decision tree ourse ratings dataset Adapted from Hamed Pirsiavash. Questions are features Responses are feature

(Shotton et al., CVPR 2011)

Page 38: Decision Trees, Ensembles, Reinforcement Learning · Example: Learning a decision tree ourse ratings dataset Adapted from Hamed Pirsiavash. Questions are features Responses are feature

Outline

Decision Trees

Ensemble Methods

Bagging

Random Forests

Reinforcement Learning

Page 39: Decision Trees, Ensembles, Reinforcement Learning · Example: Learning a decision tree ourse ratings dataset Adapted from Hamed Pirsiavash. Questions are features Responses are feature

There’s an entire book!

http://incompleteideas.net/book/the-book-

2nd.html

Page 40: Decision Trees, Ensembles, Reinforcement Learning · Example: Learning a decision tree ourse ratings dataset Adapted from Hamed Pirsiavash. Questions are features Responses are feature

Reinforcement Learning

Robot image: openclipart.orghttps://static.vecteezy.com/system/resources/previews/000/090/451/original/four-seasons-landscape-illustrations-vector.jpg

agent

environment

Page 41: Decision Trees, Ensembles, Reinforcement Learning · Example: Learning a decision tree ourse ratings dataset Adapted from Hamed Pirsiavash. Questions are features Responses are feature

Reinforcement Learning

take action

Robot image: openclipart.orghttps://static.vecteezy.com/system/resources/previews/000/090/451/original/four-seasons-landscape-illustrations-vector.jpg

agent

environment

Page 42: Decision Trees, Ensembles, Reinforcement Learning · Example: Learning a decision tree ourse ratings dataset Adapted from Hamed Pirsiavash. Questions are features Responses are feature

Reinforcement Learning

take action

Robot image: openclipart.orghttps://static.vecteezy.com/system/resources/previews/000/090/451/original/four-seasons-landscape-illustrations-vector.jpg

agent

get new state and/or reward environment

Page 43: Decision Trees, Ensembles, Reinforcement Learning · Example: Learning a decision tree ourse ratings dataset Adapted from Hamed Pirsiavash. Questions are features Responses are feature

Reinforcement Learning

take action

Robot image: openclipart.orghttps://static.vecteezy.com/system/resources/previews/000/090/451/original/four-seasons-landscape-illustrations-vector.jpg

agent

get new state and/or reward environment

Page 44: Decision Trees, Ensembles, Reinforcement Learning · Example: Learning a decision tree ourse ratings dataset Adapted from Hamed Pirsiavash. Questions are features Responses are feature

Markov Decision Process: Formalizing Reinforcement Learning

take action

Robot image: openclipart.orghttps://static.vecteezy.com/system/resources/previews/000/090/451/original/four-seasons-landscape-illustrations-vector.jpg

agent

get new state and/or reward environment

(𝒮,𝒜,ℛ, 𝑝, 𝛾)Markov Decision

Process:

Page 45: Decision Trees, Ensembles, Reinforcement Learning · Example: Learning a decision tree ourse ratings dataset Adapted from Hamed Pirsiavash. Questions are features Responses are feature

Markov Decision Process: Formalizing Reinforcement Learning

take action

Robot image: openclipart.orghttps://static.vecteezy.com/system/resources/previews/000/090/451/original/four-seasons-landscape-illustrations-vector.jpg

agent

get new state and/or reward environment

(𝒮,𝒜,ℛ, 𝑝, 𝛾)Markov Decision

Process:set of

possible states

set of possible actions

Page 46: Decision Trees, Ensembles, Reinforcement Learning · Example: Learning a decision tree ourse ratings dataset Adapted from Hamed Pirsiavash. Questions are features Responses are feature

Markov Decision Process: Formalizing Reinforcement Learning

take action

Robot image: openclipart.orghttps://static.vecteezy.com/system/resources/previews/000/090/451/original/four-seasons-landscape-illustrations-vector.jpg

agent

get new state and/or reward environment

(𝒮,𝒜,ℛ, 𝑝, 𝛾)Markov Decision

Process:set of

possible states

reward of (state,

action) pairs

set of possible actions

Page 47: Decision Trees, Ensembles, Reinforcement Learning · Example: Learning a decision tree ourse ratings dataset Adapted from Hamed Pirsiavash. Questions are features Responses are feature

Markov Decision Process: Formalizing Reinforcement Learning

take action

Robot image: openclipart.orghttps://static.vecteezy.com/system/resources/previews/000/090/451/original/four-seasons-landscape-illustrations-vector.jpg

agent

get new state and/or reward environment

(𝒮,𝒜,ℛ, 𝑝, 𝛾)Markov Decision

Process:set of

possible states

reward of (state,

action) pairs

set of possible actions

state-action transition

distribution

Page 48: Decision Trees, Ensembles, Reinforcement Learning · Example: Learning a decision tree ourse ratings dataset Adapted from Hamed Pirsiavash. Questions are features Responses are feature

Markov Decision Process: Formalizing Reinforcement Learning

take action

Robot image: openclipart.orghttps://static.vecteezy.com/system/resources/previews/000/090/451/original/four-seasons-landscape-illustrations-vector.jpg

agent

get new state and/or reward environment

(𝒮,𝒜,ℛ, 𝑝, 𝛾)Markov Decision

Process:set of

possible states

reward of (state,

action) pairs

set of possible actions

state-action transition

distribution

discount factor

Page 49: Decision Trees, Ensembles, Reinforcement Learning · Example: Learning a decision tree ourse ratings dataset Adapted from Hamed Pirsiavash. Questions are features Responses are feature

Markov Decision Process: Formalizing Reinforcement Learning

(𝒮,𝒜,ℛ, 𝜋, 𝛾)Markov Decision

Process:set of

possible states

reward of (state,

action) pairs

set of possible actions

state-action transition

distribution

discount factor

Start in initial state 𝑠0

Page 50: Decision Trees, Ensembles, Reinforcement Learning · Example: Learning a decision tree ourse ratings dataset Adapted from Hamed Pirsiavash. Questions are features Responses are feature

Markov Decision Process: Formalizing Reinforcement Learning

(𝒮,𝒜,ℛ, 𝜋, 𝛾)Markov Decision

Process:set of

possible states

reward of (state,

action) pairs

set of possible actions

state-action transition

distribution

discount factor

Start in initial state 𝑠0for t = 1 to …:

choose action 𝑎𝑡

Page 51: Decision Trees, Ensembles, Reinforcement Learning · Example: Learning a decision tree ourse ratings dataset Adapted from Hamed Pirsiavash. Questions are features Responses are feature

Markov Decision Process: Formalizing Reinforcement Learning

(𝒮,𝒜,ℛ, 𝜋, 𝛾)Markov Decision

Process:set of

possible states

reward of (state,

action) pairs

set of possible actions

state-action transition

distribution

discount factor

Start in initial state 𝑠0for t = 1 to …:

choose action 𝑎𝑡“move” to next state 𝑠𝑡 ∼ 𝜋 ⋅ 𝑠𝑡−1, 𝑎𝑡)

Page 52: Decision Trees, Ensembles, Reinforcement Learning · Example: Learning a decision tree ourse ratings dataset Adapted from Hamed Pirsiavash. Questions are features Responses are feature

Markov Decision Process: Formalizing Reinforcement Learning

(𝒮,𝒜,ℛ, 𝜋, 𝛾)Markov Decision

Process:set of

possible states

reward of (state,

action) pairs

set of possible actions

state-action transition

distribution

discount factor

Start in initial state 𝑠0for t = 1 to …:

choose action 𝑎𝑡“move” to next state 𝑠𝑡 ∼ 𝜋 ⋅ 𝑠𝑡−1, 𝑎𝑡)get reward 𝑟𝑡 = ℛ(𝑠𝑡 , 𝑎𝑡)

Page 53: Decision Trees, Ensembles, Reinforcement Learning · Example: Learning a decision tree ourse ratings dataset Adapted from Hamed Pirsiavash. Questions are features Responses are feature

Markov Decision Process: Formalizing Reinforcement Learning

(𝒮,𝒜,ℛ, 𝜋, 𝛾)Markov Decision

Process:set of

possible states

reward of (state,

action) pairs

set of possible actions

state-action transition

distribution

discount factor

Start in initial state 𝑠0for t = 1 to …:

choose action 𝑎𝑡“move” to next state 𝑠𝑡 ∼ 𝜋 ⋅ 𝑠𝑡−1, 𝑎𝑡)get reward 𝑟𝑡 = ℛ(𝑠𝑡 , 𝑎𝑡)

objective: maximize time-discounted

reward

Page 54: Decision Trees, Ensembles, Reinforcement Learning · Example: Learning a decision tree ourse ratings dataset Adapted from Hamed Pirsiavash. Questions are features Responses are feature

Markov Decision Process: Formalizing Reinforcement Learning

(𝒮,𝒜,ℛ, 𝜋, 𝛾)Markov Decision

Process:set of

possible states

reward of (state,

action) pairs

set of possible actions

state-action transition

distribution

discount factor

Start in initial state 𝑠0for t = 1 to …:

choose action 𝑎𝑡“move” to next state 𝑠𝑡 ∼ 𝜋 ⋅ 𝑠𝑡−1, 𝑎𝑡)get reward 𝑟𝑡 = ℛ(𝑠𝑡 , 𝑎𝑡)

objective: maximize discounted reward

max𝜋

𝑡>0

𝛾𝑡𝑟𝑡

Page 55: Decision Trees, Ensembles, Reinforcement Learning · Example: Learning a decision tree ourse ratings dataset Adapted from Hamed Pirsiavash. Questions are features Responses are feature

Markov Decision Process: Formalizing Reinforcement Learning

(𝒮,𝒜,ℛ, 𝜋, 𝛾)Markov Decision

Process:set of

possible states

reward of (state,

action) pairs

set of possible actions

state-action transition

distribution

discount factor

Start in initial state 𝑠0for t = 1 to …:

choose action 𝑎𝑡“move” to next state 𝑠𝑡 ∼ 𝜋 ⋅ 𝑠𝑡−1, 𝑎𝑡)get reward 𝑟𝑡 = ℛ(𝑠𝑡 , 𝑎𝑡)

“solution”: the policy 𝜋∗ that maximizes the expected (average) time-discounted reward

objective: maximize discounted reward

max𝜋

𝑡>0

𝛾𝑡𝑟𝑡

Page 56: Decision Trees, Ensembles, Reinforcement Learning · Example: Learning a decision tree ourse ratings dataset Adapted from Hamed Pirsiavash. Questions are features Responses are feature

Markov Decision Process: Formalizing Reinforcement Learning

(𝒮,𝒜,ℛ, 𝜋, 𝛾)Markov Decision

Process:set of

possible states

reward of (state,

action) pairs

set of possible actions

state-action transition

distribution

discount factor

Start in initial state 𝑠0for t = 1 to …:

choose action 𝑎𝑡“move” to next state 𝑠𝑡 ∼ 𝜋 ⋅ 𝑠𝑡−1, 𝑎𝑡)get reward 𝑟𝑡 = ℛ(𝑠𝑡 , 𝑎𝑡)

𝜋∗ = argmax𝜋

𝔼

𝑡>0

𝛾𝑡𝑟𝑡 ; 𝜋“solution”

objective: maximize discounted reward

max𝜋

𝑡>0

𝛾𝑡𝑟𝑡

Page 57: Decision Trees, Ensembles, Reinforcement Learning · Example: Learning a decision tree ourse ratings dataset Adapted from Hamed Pirsiavash. Questions are features Responses are feature

Designing Rewards is Highly Task Dependent

Rewards indicate what we want to accomplish, NOT how we want to accomplish it

shapingpositive reward often very “far away”rewards for achieving subgoals (domain knowledge)also: adjust initial policy or initial value function

Example: robot in a mazeepisodic task, not discounted, +1 when out, 0 for each step

Example: chessGOOD: +1 for winning, -1 losingBAD: +0.25 for taking opponent’s pieces

high reward even when lose

Slide courtesy/adapted: Peter Bodík

Page 58: Decision Trees, Ensembles, Reinforcement Learning · Example: Learning a decision tree ourse ratings dataset Adapted from Hamed Pirsiavash. Questions are features Responses are feature

Overview: Learning Strategies

Dynamic Programming

Q-learning

Monte Carlo approaches

Page 59: Decision Trees, Ensembles, Reinforcement Learning · Example: Learning a decision tree ourse ratings dataset Adapted from Hamed Pirsiavash. Questions are features Responses are feature

Q-learning

𝑄: 𝑠, 𝑎 → ℝ

Goal: learn a function that computes a “goodness” scorefor taking a particular action 𝑎

in state 𝑠

Page 60: Decision Trees, Ensembles, Reinforcement Learning · Example: Learning a decision tree ourse ratings dataset Adapted from Hamed Pirsiavash. Questions are features Responses are feature

Deep/Neural Q-learning

𝑄 𝑠, 𝑎; 𝜃 ≈ 𝑄∗(𝑠, 𝑎)desired optimal solutionneural network

Approach: Form (and learn) a neural network to model

our optimal Q function

Page 61: Decision Trees, Ensembles, Reinforcement Learning · Example: Learning a decision tree ourse ratings dataset Adapted from Hamed Pirsiavash. Questions are features Responses are feature

Deep/Neural Q-learning

𝑄 𝑠, 𝑎; 𝜃 ≈ 𝑄∗(𝑠, 𝑎)desired optimal solutionneural network

Approach: Form (and learn) a neural network to model

our optimal Q function

Learn weights (parameters) 𝜃 of our

neural network

Page 62: Decision Trees, Ensembles, Reinforcement Learning · Example: Learning a decision tree ourse ratings dataset Adapted from Hamed Pirsiavash. Questions are features Responses are feature

Outline

Decision Trees

Ensemble Methods

Bagging

Random Forests

Reinforcement Learning