CSC 4510/9010: Applied Machine Learningmatuszek/spring2015/ReinforcementAnd... · CSC 4510/9010: Applied Machine Learning ... c28_rl.ppt, taken in turn from ... Alvinn (driving agent)

CSC 4510/9010 Spring 2015. Paula Matuszek

CSC 4510/9010: Applied Machine Learning

Reinforcement and Transfer Learning

Dr. Paula Matuszek

[email protected]

[email protected]

(610) 647-9789

1

mailto:[email protected]

mailto:[email protected]


What Is Machine Learning?• “Learning denotes changes in a system that ... enable a

system to do the same task more efficiently the next time.” –Herbert Simon – In other words, the end result is a changed model or of some

kind; the focus is on the end product• “Learning is constructing or modifying representations

of what is being experienced.” –Ryszard Michalski – The experiences perceived must be captured or represented in

some way; learning modifies that representation. This definition focuses on the process, rather than the result.

2


So what is Machine Learning?• We can consider that the “system” is a computer

and its programs, or a statistical model with parameters.

• Another way of looking at machine learning is as a way to get a computer to do things without having to explicitly describe what steps to take, by giving it examples or feedback

• The computer then looks for patterns which can explain or predict what happens.

• The computer is trained through the examples.

3


The Architecture of a ML System

• Every machine learning system has four parts:– a representation or model of what is being

learned– an actor. This is the part that uses the

representation and actually does something.– a critic. The part that provides feedback. – a learner. The part that modifies the

representation or model, using the feedback.

4

Based on Russell and Norvig, Artificial Intelligence: A Modern Approach, Third Edition, Prentice-Hall, 2009. http://aima.cs.berkeley.edu/

http://aima.cs.berkeley.edu


Representation• A learning system must have a

representation or model of what is being learned.

• This is the component that changes based on experience.

• In a machine learning system this may be a mathematical model or formula, a set of rules, a decision tree, or some other form of information.

5


General Model of Learning Agent

6

Environment

Agent

Critic

Learning Element

Problem Generator

Performer with KB

Performance Standard

Sensors

Effectors

feedback

learning goals

changes

knowledge

Based on Russell and Norvig, Artificial Intelligence: A Modern Approach, Third Edition, Prentice-Hall, 2009. http://aima.cs.berkeley.edu/

http://aima.cs.berkeley.edu

copyright Paula Matuszek 20159

Paula Matuszek

Animals Behind the Scene7

Is it a bird?

Is it a penguin?

Yes

Does it have fur?

No

Is it a mouse? Is it a frog?

Yes No

After several passes.

Computer: Is it a bird?Human: NoComputer: Is it a frog?Human: NoComputer: What is it?Human: A mouseComputer: Tell me a question that distinguishes a frog from a mouse.Human: Does it have fur?Computer: What is the answer for a frog?Human: no


Animals Guessing Game Architecture• The Animals Guessing Game has all of the parts

of a Machine Learning Architecture:– The Representation is a sequence of questions and

pairs of yes/no answers (called a binary decision tree).– The Actor “walks” the tree, interacting with a human;

at each question it chooses whether to follow the “yes” branch or the “no” branch.

– The Critic is the human player telling the game whether it has guessed correctly.

– The Learner elicits new questions and adds questions, guesses and branches to the tree.

8


Reinforcement Learning• The Animals Game is a simple form of

Reinforcement Learning.• Very early concept in Artificial

Intelligence!

9

• www-03.ibm.com/ibm/history/ibm100/us/en/icons/ibm700series/impacts/

http://www-03.ibm.com/ibm/history/ibm100/us/en/icons/ibm700series/impacts/

based on http://www.csee.umbc.edu/courses/671/fall05/slides/c28_rl.ppt, taken in turn from Jean-Claude Latombe and Lise Getoor


Reinforcement Learning

http://www.csee.umbc.edu/courses/671/fall05/slides/c28_rl.ppt



Reinforcement LearningSupervised (inductive) learning is the simplest and most studied type of learning/




Reinforcement LearningSupervised (inductive) learning is the simplest and most studied type of learning/How can an agent learn behaviors when it doesn’t have a teacher to tell it how to perform? What’s the critic?




Reinforcement LearningSupervised (inductive) learning is the simplest and most studied type of learning/How can an agent learn behaviors when it doesn’t have a teacher to tell it how to perform? What’s the critic?One solution is unsupervised learning




Reinforcement LearningSupervised (inductive) learning is the simplest and most studied type of learning/How can an agent learn behaviors when it doesn’t have a teacher to tell it how to perform? What’s the critic?One solution is unsupervised learningFor a more complex problem: ■ The agent has a task to perform ■ It takes some actions in the world ■ At some later point, it gets feedback telling it how well it did on

performing the task ■ The agent performs the same task over and over again




Reinforcement LearningSupervised (inductive) learning is the simplest and most studied type of learning/How can an agent learn behaviors when it doesn’t have a teacher to tell it how to perform? What’s the critic?One solution is unsupervised learningFor a more complex problem: ■ The agent has a task to perform ■ It takes some actions in the world ■ At some later point, it gets feedback telling it how well it did on

performing the task ■ The agent performs the same task over and over again

This problem is called reinforcement learning: ■ The agent gets positive reinforcement for tasks done well ■ The agent gets negative reinforcement for tasks done poorly




Reinforcement Learning (cont.)





The goal is to get the agent to act in the world so as to maximize its rewards





The goal is to get the agent to act in the world so as to maximize its rewardsThe agent has to figure out what it did that

made it get the reward/punishment ■ This is known as the credit assignment problem





The goal is to get the agent to act in the world so as to maximize its rewardsThe agent has to figure out what it did that

made it get the reward/punishment ■ This is known as the credit assignment problemReinforcement learning approaches can be

used to train computers to do many tasks ■ backgammon and chess playing ■ job shop scheduling ■ controlling robot limbs




Formalization

Given: ■ a state space S ■ a set of actions a1, …, ak

■ reward value at the end of each trial (may be positive or negative)

Output: ■ a mapping from states to actions




Formalization

Given: ■ a state space S ■ a set of actions a1, …, ak

■ reward value at the end of each trial (may be positive or negative)

Output: ■ a mapping from states to actions

example: Alvinn (driving agent) state: configuration of the car

learn a steering action for each state




Repeat: ⬥ s ß sensed state ⬥ If s is terminal then exit ⬥ a ß choose action (given s) ⬥ Perform a

Reactive Agent Algorithm




Accessible or observable stateRepeat:

⬥ s ß sensed state ⬥ If s is terminal then exit ⬥ a ß choose action (given s) ⬥ Perform a




CSC 4510/9010 Spring 2015. Paula Matuszek 14

Policy (Reactive/Closed-Loop Strategy)

• A policy Π is a complete mapping from states to actions

-1

+1

2

3

1

4321




Repeat: ⬥ s ß sensed state ⬥ If s is terminal then exit ⬥ a ß Π(s) ⬥ Perform a





Approaches

Learn policy directly– function mapping from states to actions Learn utility values for states (i.e., the

value function)




Value FunctionThe agent knows what state it is in The agent has a number of actions it can perform in each state. Initially, it doesn't know the value of any of the states If the outcome of performing an action at a state is deterministic, then the agent can update the utility value U() of states: ■ U(oldstate) = reward + U(newstate) The agent learns the utility values of states as it works its way through the state space



Learning States and Actions• A typical approach is:• At state S choose some action A• Taking us to new State S1.

– If S1 has a positive value, increase value of A at S.– If S1 has a negative value, decrease value of A at S.– If S1 is new initial value is unknown. 0?

• Repeat until?• One complete learning pass eventually gets to a

deterministic state. (win or lose)

18



ExplorationThe agent may occasionally choose to explore suboptimal moves in the hopes of finding better outcomes ■ Only by visiting all the states frequently enough can we

guarantee learning the true values of all the states

A discount factor is often introduced to prevent utility values from diverging and to promote the use of shorter (more efficient) sequences of actions to attain rewards The update equation using a discount factor γ is: ■ U(oldstate) = reward + γ * U(newstate) Normally, γ is set between 0 and 1




Selecting an Action




Selecting an ActionSimply choose action with highest (current)

expected utility?





expected utility?Problem: each action has two effects ■ yields a reward (or penalty) on current sequence ■ information is received and used in learning for

future sequences






future sequences

Trade-off: immediate good for long-term well-being






future sequences

Trade-off: immediate good for long-term well-being

try a shortcut – you might get lost; you might learn a new, quicker route!




Exploration policy




Exploration policyWacky approach (exploration): act randomly

in hopes of eventually exploring entire environment





in hopes of eventually exploring entire environmentGreedy approach (exploitation): act to

maximize utility using current estimate






maximize utility using current estimateReasonable balance: act more wacky

(exploratory) when agent has little idea of environment; more greedy when the model is close to correct






maximize utility using current estimateReasonable balance: act more wacky

(exploratory) when agent has little idea of environment; more greedy when the model is close to correctExample: n-armed bandits…




RL Summary

Active area of research Approaches from both OR and AI There are many more sophisticated

algorithms that we have not discussed Applicable to game-playing, robot

controllers, others



Reinforcement Learning• Reinforcement learning systems learn a

series of actions or decisions, rather than a single decision, based on feedback given at the end of the series.

• A reinforcement learner has a goal, and carries out trial-and-error search to find the best paths toward that goal

23


Reinforcement Learning• A typical reinforcement learning system is an active

agent, interacting with its environment.• It must balance

– exploration: trying different actions and sequences of actions to discover which ones work best

– achievement: using sequences which have worked well so far

• It must also learn successful sequences of actions in an uncertain environment

• Typical current applications are in artificial intelligence and in engineering.

24


Transfer Learning

• Slides based on presentation from Haitham Bou Ammar, Maastricht University

25


Transfer Learning• Data used in training a classifier must be

properly chosen to be representative• If not? Accuracy will be worse than expected• But suppose we want to apply a classifier to a

new or shifting domain? Retrain!– But that’s expensive.

• Can we somehow use our existing classifier as a starting point to give us a shortcut?

• This is Transfer Learning.26

Based on https://project.dke.maastrichtuniversity.nl/datamining/2013-Slides/transfer-01.pptCSC 4510/9010 Spring 2015. Paula Matuszek

Motivation

https://project.dke.maastrichtuniversity.nl/datamining/2013-Slides/transfer-01.ppt


Motivation

€

y

€

x

Training Data



Motivation

Model

€

x€

y



Motivation

Model

€

x€

y?

Test Data



Motivation

Model

€

x€

y?

Test Data

Assumptions: 1.Training and Test are from same distribution 2.Training and Test are in same feature space



Motivation

Model

€

x€

y?

Test Data

Assumptions: 1.Training and Test are from same distribution 2.Training and Test are in same feature spaceN

ot T

rue

in m

any r

eal-w

orld

appl

icat

ions



Examples: Web-document Classification













Model

Physics Machine Learning

Life Science




Model

?


Life Science




Model

?


Life Science




Model

?


Life Science

Content Change !




Model

?


Life Science

Content Change !

Assumption violated!




Model

?


Life Science

Content Change !

Assumption violated!

Learn a new model





Learn new Model :

1. Collect new Labeled Data 2. Build new model



Learn new Model :




Learn new Model :


Reuse & Adapt already learned model !



Examples: Image Classification










Features Task One




Model OneFeatures Task One

Task One







Cars

Motorcycles




Cars

Motorcycles

Features Task Two




Cars

MotorcyclesTask Two

Features Task Two

Model Two




Cars

MotorcyclesTask Two

Features Task One

Features Task Two

Model Two




Cars

MotorcyclesTask Two

Features Task One

Features Task Two

Reuse

Model Two



Traditional Machine Learning vs. Transfer




Different Tasks

Learning System

Learning System

Learning System

Traditional Machine Learning




Source Task

Knowledge

Target Task

Learning System

Different Tasks

Learning System

Learning System

Learning System

Traditional Machine Learning Transfer Learning



Transfer Learning Definition

Given a source domain and source learning task, a target domain and a target learning task, transfer learning aims to help improve the learning of the target predictive function using the source knowledge, where

or



Transfer Definition

● Therefore, if either :



Transfer Definition




Transfer Definition




Transfer Definition

● Therefore, if either : Domain Differences



Transfer Definition




Transfer Definition




Transfer Definition


Task Differences



Questions to answer when transferring



Questions to answer when transferringW

hat to T

ransfe

r ?




hat to T

ransfe

r ?

Instances

?




hat to T

ransfe

r ?

Instances

?

Model ?




hat to T

ransfe

r ?

Instances

?

Model ?

Features ?




hat to T

ransfe

r ?

How to Tra

nsfer ?

Instances

?

Model ?

Features ?




hat to T

ransfe

r ?

How to Tra

nsfer ?

Instances

?

Model ?

Features ?

Weig

ht Insta

nces ?




hat to T

ransfe

r ?

How to Tra

nsfer ?

Instances

?

Model ?

Features ?

Map

Mod

el ?

Weig

ht Insta

nces ?




hat to T

ransfe

r ?

How to Tra

nsfer ?

Instances

?

Model ?

Features ?

Map

Mod

el ?

Unify Fea

tures ?

Weig

ht Insta

nces ?




hat to T

ransfe

r ?

How to Tra

nsfer ?

When

to T

ransf

er ?

Instances

?

Model ?

Features ?

Map

Mod

el ?

Unify Fea

tures ?

Weig

ht Insta

nces ?

In w

hich

Situ

ation

s



Different Distributions• Example: classify documents from the

web into important or not important– Documents in different domains have the

same feature space: Bag of words (frequency of each term)

– However, the words have different frequencies in the different domains

– The distribution of features is different• So modify instances

37


Algorithms: TrAdaBoost

● Assumptions: ○ Source and Target task have same feature space:

○ Marginal distributions are different:



Algorithms: TrAdaBoost

● Assumptions: ○ Source and Target task have same feature space:

○ Marginal distributions are different:

Not all source data might be helpful !



Algorithm: TrAdaBoost

● Idea: ○ Iteratively reweight source samples such that:

÷ reduce effect of “bad” source instances ÷ encourage effect of “good” source instances

● Requires: ○ Source task labeled data set ○ Very small Target task labeled data set ○ Unlabeled Target data set ○ Base Learner



Different Features• Example: classify images into cars and

motorcycles• We already have a classifier that classifies

images into trucks and buses• Features won’t be the same

– but some of them will (driver enclosed?)– and some of them will be similar but on

different dimensions (big or small?)

40


Transferring Features• Many methods:

– Supervised Feature Construction. Self-taught learning.

– Unsupervised Feature Construction– TAMAR

41


An overview of various settings of transfer learning

slide from http://www1.i2r.a-star.edu.sg/~jspan/publications/A%20Survey%20on%20Transfer%20Learning.ppt

http://www1.i2r.a-star.edu.sg/~jspan/publications/A%20Survey%20on%20Transfer%20Learning.ppt


Transfer Learning





Transfer Learning

Inductive Transfer Learning

Labeled data are available in a target domain





Transfer Learning



No labeled data in a source domain

Case 1An overview of

various settings of transfer learning




Transfer Learning


Self-taught Learning



Case 1An overview of

various settings of transfer learning




Transfer Learning





Labeled data are available in a source domain

Case 1

Case 2





Transfer Learning

Multi-task Learning






Case 1

Case 2Source and

target tasks are learnt

simultaneously





Transfer Learning

Multi-task Learning

Transductive Transfer Learning




Labeled data are available only in a

source domain



Case 1

Case 2Source and


simultaneously





Transfer Learning

Multi-task Learning



Domain Adaptation




source domain



Case 1

Case 2Source and


simultaneously

Assumption: different

domains but single task





Transfer Learning

Multi-task Learning



Domain Adaptation

Sample Selection Bias /Covariance Shift




source domain



Case 1

Case 2Source and


simultaneously



Assumption: single domain and single task





Transfer Learning

Multi-task Learning


Unsupervised Transfer Learning


Domain Adaptation

Sample Selection Bias /Covariance Shift




source domain

No labeled data in both source and target domain



Case 1

Case 2Source and


simultaneously



Assumption: single domain and single task





Conclusions

● Transfer learning is to re-use source knowledge to help a target learner

● Transfer learning is not generalization

● Self-Taught learning transfer unlabeled features



Summary 1• Goal of transfer learning: to reuse

knowledge from previous learner to help develop a new learner.

• New learner can be required for– new instances

• different features• different distribution of the same features

– a new task

44


Summary 2• We can transfer knowledge from

– instances– features– model

• It’s not always worth transferring. There must still be some relationship between the knowledge behind the two learners

• Complex and growing field

45