Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker...

Machine Learning via

Advice Taking

Jude Shavlik

Thanks To ...

Rich MaclinLisa TorreyTrevor Walker

Prof. Olvi MangasarianGlenn FungTed Wild

Quote (2002) from DARPA

Sometimes an assistant will merely watch you and draw conclusions.

Sometimes you have to tell a new person, 'Please don't do it this way' or 'From now on when I say X, you do Y.'

It's a combination of learning by example and by being guided.

Widening the “Communication Pipeline” between Humans and Machine Learners

Teacher

Machine Learner

Our Approach to Building Better Machine Learners

• Human partner expresses advice “naturally” and w/o knowledge of ML agent’s internals

• Agent incorporates advice directly into the function it is learning

• Additional feedback (rewards, I/O pairs, inferred labels, more advice) used to refine learner continually

“Standard” Machine Learning vs. Theory Refinement

• Positive Examples (“should see doctor”) temp = 102.1, age = 21, sex = F, …

temp = 101.7, age = 37, sex = M, …

• Negative Examples (“take two aspirins”) temp = 99.1, age = 43, sex = M, …

temp = 99.6, age = 24, sex = F, …

• Approximate Domain Knowledge if temp = high and age = young … then neg example

Related work by labs of Mooney, Pazzani, Cohen, Giles, etc

Rich Maclin’s PhD (1995)

IF a Bee is (Near and West) & an Ice is (Near and North)Then Begin Move East Move North END

Number of Training Episodes

Sample Results

Without advice

With advice

Our Motto

Give advice

rather than commands

to your computer

Outline

Prior Knowledge and Support Vector Machines Intro to SVM’s Linear Separation Non-Linear Separation Function Fitting (“Regression”) Advice-Taking Reinforcement Learning Transfer Learning via Advice Taking

Support Vector MachinesMaximizing the Margin between Bounding Planes

x0w = í + 1

x0w = í à 1

jjwjj22

Support Vectors?

Margin

Linear Algebra for SVM’s

• Given p points in n dimensional space• Represent by p-by-n matrix A of reals

• More succinctly

D(Awà eí )=e;where e is vector of ones

• Separate by two bounding planes

A iw=í + 1; for D i i = + 1;

A iw5 í à 1; for D i i = à 1:

• Each Ai in class +1 or -1

“Slack” VariablesDealing with Data that is not Linearly Separable

Support Vectors

Support Vector Machines Quadratic Programming Formulation

• Solve this quadratic program

D(Awà eí ) >e;y > 0;w; í e0ymin

s.t. + y÷ + 2

1jjwjj22

• Maximize margin by minimizing21kwk2

2jjwjj22

• Minimize sum of slack vars with wgt

Support Vector MachinesLinear Programming Formulation

Use 1-norm instead of 2-norm(typically runs faster; better feature selection;might generalize better, NIPS ‘03)

÷e0y+ kwk1y > 0;w; í

D(Awà eí ) + y > e

Knowledge-Based SVM’sGeneralizing “Example” from POINT to REGION

Incorporating “Knowledge Sets”

Into the SVM Linear Program

This implication equivalent to set of constraints (proof in NIPS ’02 paper)

• Suppose that knowledge set belongs to class A+

Hence must lie in half space

èx??Bx 6 b

èxjx0w>í + 1

Bx6b ) x0w>í + 1

• We therefore have the implication

Resulting LP for KBSVM’s

We get this linear program (LP)

Ranges over # regions

KBSVM with Slack Variables

SVMs and Non-Linear Separating Surfaces

h(f1, f2)

g(f1, f2) +

Non-linearly map to new space

Linearly separate in new space (using kernels)

Result is non-linear separator in original space

Fung et al. (2003) presents knowledge-

based non-linear SVMs

Support Vector Regression(aka Kernel Regression)

Linearly approximating a function, given array A of inputs and vector y of (numeric) outputs

f(x) ≈ x’w + b

Find weights such that

Aw + be ≈ y

In dual space, w = A’, so get

(A A’) + be ≈ y

Kernel’izing (to get non-linear approx)

K(A,A’) + be ≈ y

What to Optimize?

Linear program to optimize

• 1st term () is “regularizer” that minimizes model complexity

• 2nd term is approximation error, weighted by parameter C

• Classical “least squares” fit if quadratic version and first term ignored

Predicting Y for New X

y = K(x’, A’) + b

• Use Kernel to compute “distance” to each training point (ie, row in A)

• Weight by i (hopefully many i are zero), Sum

• Add b (a scalar)

Knowledge-Based SVRMangasarian, Shavlik, & Wild, JMLR ‘04

Add soft constraints to linear program (so need only follow advice approximately)

minimize ||w||1 + C ||s||1

+ penalty for violating advice

such that y - s Aw + b y + s “slacked” match to advice

Advice: In this region, y should exceed 4

Testbeds: Subtasks of RoboCup

Keep ball from opponents

[Stone & Sutton, ICML 2001]

Mobile KeepAway

Score goal

[Maclin et al., AAAI 2005]

BreakAway

Reinforcement Learning Overview

Take an actionReceive a state

Receive a reward

Policy: choose the action with the highest Q-value in the current state

Use the rewards to

estimate the Q-values of actions in

states

Described by a set of features

Incorporating Advice in KBKR

Advice format Bx ≤ d f(x) ≥ hx +

TeammatedistanceTo

shotAngle

GoaldistanceTo

If distanceToGoal ≤ 10 and

shotAngle ≥ 30

Then Q(shoot) ≥ 0.9

Giving Advice About Relative Values of Multiple Functions

Maclin et al, AAAI ’05

When the input satisfies

preconditions(input)

f1(input) > f2(input)

Sample Advice-Taking Results

if distanceToGoal 10

and shotAngle 30

then prefer shoot over all other actions

0 5000 10000 15000 20000 25000

Games Played

advice

std RL2 vs 1 BreakAway, rewards +1, -1

Q(shoot) > Q(pass)Q(shoot) > Q(move)

Transfer Learning

Agent discovers how tasks are related

We use a user

mappingto tell the agent this

Agent learns Task A

Agent encounters related Task B

Agent uses knowledge from Task A to learn Task B faster

Task A is the

source Task B is

the target

Transfer Learning:The Goal for the Target Task

training

with transfer

without transfer

better start

faster rise better asymptote

Our Transfer Algorithm

Observe source task games to learn skills

Use ILP to create advice for the target task

Learn target taskwith KBKR

Translate learned skills

into transfer advice

If there is user advice, add it

Learning Skills By Observation

• Source-task games are sequences: (state, action)• Learning skills is like learning to classify states

by their correct actions• ILP = Inductive Logic Programming

State 1distBetween(me,teammate2) = 15distBetween(me,teammate1) = 10distBetween(me,opponent1) = 5...action = pass(teammate2)outcome = caught(teammate2)

ILP: Searching for First-Order Rules

P :- true

P :- Q P :- R P :- S

P :- R, Q P :- R, S

P :- R, S, V, W, XWe also use a

random-sampling approach

Advantages of ILP

• Can produce first-order rules for skills• Capture only the essential aspects of the skill• We expect these aspects to transfer better

• Can incorporate background knowledge

pass(Teammate)

pass(teammate1)

pass(teammateN)

vs....

Example of a Skill Learned by ILP from KeepAway

pass(Teammate) :- distBetween(me, Teammate) > 14, passAngle(Teammate) > 30, passAngle(Teammate) < 150, distBetween(me, Opponent) < 7.

Also gave “human” advice about shooting, since that is new skill in BreakAway

TL Level 7: KA to BA Raw Curves

TL Level 7: KA to BA Averaged Curves

TL Level 7: StatisticsTL Metrics Average Reward

Type Name KA to BA MD to BA

Score P Value Score P Value

I Jump start 0.05 0.0312 0.08 0.0086

Jump start smoothed 0.08 0.0002 0.06 0.0014

II Transfer ratio 1.82 0.0034 1.86 0.0004

Transfer ratio (truncated) 1.82 0.0032 1.86 0.0004

Average relative reduction (narrow)

0.58 0.0042 0.54 0.0004

Average relative reduction (wide) 0.70 0.0018 0.71 0.0008

Ratio (of area under the curves) 1.37 0.0056 1.41 0.0012

Transfer difference 503.57 0.0046 561.27

0.0008

Transfer difference (scaled) 1017.00

0.0040 1091.2

0.0016

III Asymptotic advantage 0.09 0.0086 0.11 0.0040

Asymptotic advantage smoothed 0.08 0.0116 0.10 0.0030

Boldface indicates a significant difference was found

Conclusion

• Can use much more than I/O pairs in ML

• Give advice to computers; theyautomatically refine it based on feedback from user or environment

• Advice an appealing mechanism for transferring learned knowledgecomputer-to-computer

Some Papers (on-line, use Google :-)

Creating Advice-Taking Reinforcement Learners, Maclin & Shavlik, Machine Learning 1996

Knowledge-Based Support Vector Machine Classifiers, Fung, Mangasarian, & Shavlik, NIPS 2002

Knowledge-Based Nonlinear Kernel Classifiers, Fung, Mangasarian, & Shavlik, COLT 2003

Knowledge-Based Kernel Approximation, Mangasarian, Shavlik, & Wild, JAIR 2004

Giving Advice about Preferred Actions to Reinforcement Learners Via Knowledge-Based Kernel Regression, Maclin, Shavlik, Torrey, Walker, & Wild, AAAI 2005

Skill Acquisition via Transfer Learning and Advice Taking, Torrey, Shavlik, Walker, & Maclin, ECML 2006

Backups

Breakdown of Results

0 1000 2000 3000 4000 5000

Games Played

all advice

transfer advice onlyuser advice only

no advice

What if User Advice is Bad?

0 1000 2000 3000 4000 5000

Games Played

Transfer with good advice

Transfer with bad adviceBad advice only

No advice

Related Work on Transfer

• Q-function transfer in RoboCup• Taylor & Stone (AAMAS 2005, AAAI 2005)

• Transfer via policy reuse• Fernandez & Veloso (AAMAS 2006, ICML workshop

2006)• Madden & Howley (AI Review 2004)• Torrey et al. (ECML 2005)

• Transfer via relational RL• Driessens et al. (ICML workshop 2006)

Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker...

Documents

The Disputed Federalist Papers: Resolution via Support Vector Machine Feature Selection Olvi Mangasarian UW Madison & UCSD La Jolla Glenn Fung Amazon Inc.,

The Martyrdom of Hypatia By Mangasar Mugurditch Mangasarian

Acknowledgements heno Mcraw - Amazon S3Jeffrey Stoudermire, Officiating Pallbearers Damian J. Dyer * Dorian R. McGraw Shawn D. McGraw * Cornell Trammell Will Mobley * Ta’Von Maclin,

Knowledge-Based Support Vector Machine Classifierspapers.nips.cc/paper/2222-knowledge-based-support...Knowledge-Based Support Vector Machine Classifiers Glenn M. Fung, Olvi L. Mangasarian

Machine Learning (ML) and Knowledge Discovery in Databases (KDD) Instructor: Rich Maclin

Support Vector Machine Data Mining Olvi L. Mangasarian with Glenn M. Fung, Jude W. Shavlik & Collaborators at ExonHit – Paris Data Mining Institute University

Nonlinear Data Discrimination via Generalized Support Vector Machines David R. Musicant and Olvi L. Mangasarian University of Wisconsin - Madison musicant

Support Vector Machines in Data Mining AFOSR Software & Systems Annual Meeting Syracuse, NY June 3-7, 2002 Olvi L. Mangasarian Data Mining Institute University

Mathematical Programming in Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison High Performance Computation for Engineering

FINANCIAL STATEMENTS 2014 - olvigroup.fi of Directors‘ Report 3 ... Olvi Group‘s overall performance remained ... as well as project-specific

The Disputed Federalist Papers : SVM Feature Selection via Concave Minimization Glenn Fung and Olvi L. Mangasarian CSNA 2002 June 13-16, 2002 Madison,

Foto: Notimex El poder historiasbfc1c332b5c17ae20e62-6cbba7cfb59c65abd107ce24040b0bca.r14.cf2.rackcdn.com/... · en nuestras almas y no olvi-demos que un ser ordinario puede responder

Learning from Human Teachers: Issues and Challenges for ILP in Bootstrap Learning Sriraam Natarajan 1, Gautam Kunapuli 1, Richard Maclin 3, David Page

Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik

Multiple Instance Learning via Successive Linear Programming Olvi Mangasarian Edward Wild University of Wisconsin-Madison

Feature Selection in Nonlinear Kernel Classification Olvi Mangasarian & Edward Wild University of Wisconsin Madison Workshop on Optimization-Based Data

Agri-food’ parallel session · 2020. 12. 2. · Barilla, Italy Olvi, Finland LUKE, Finland Bridge2Food, The Netherlands. CARBOSURF. CARBOSURF: New processes for the fermentative

A Story About the Ear Ms. Maclin, Mrs. Trimble & Mrs. Jameson

Package ‘penalizedSVM’ fileFung, G. and Mangasarian, O. L. (2004). A feature selection newton method for support vector machine classiﬁcation. Computational Optimization and

Incremental proximal methods for large scale convex ...dimitrib/Incremental_Proximal.pdf · analyses under various conditions; see Luo [35], Grippo [26,27], Luo and Tseng [34], Mangasarian