16
Learning From Data Lecture 1 The Learning Problem Introduction Motivation Credit Default - A Running Example Summary of the Learning Problem M. Magdon-Ismail CSCI 4100/6100

Learning From Data Lecture 1 The Learning Problem

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Learning From Data Lecture 1 The Learning Problem

Learning From DataLecture 1

The Learning Problem

Introduction

MotivationCredit Default - A Running Example

Summary of the Learning Problem

M. Magdon-IsmailCSCI 4100/6100

Page 2: Learning From Data Lecture 1 The Learning Problem

Resources

1. Web Page: www.cs.rpi.edu/∼magdon/courses/learn.php– course info: www.cs.rpi.edu/∼magdon/courses/learn/info.pdf

– slides: www.cs.rpi.edu/∼magdon/courses/learn/slides.html

– assignments: www.cs.rpi.edu/∼magdon/courses/learn/assign.html

2. Text Book:Learning From Data

Abu-Mostafa, Magdon-Ismail, Lin

3. Book Forum: book.caltech.edu/bookforum– discussion about any material in book including problems and exercises.

– additional material

4. TA.

5. Professor.

6. Prerequisites? assignment #0

c© AML Creator: Malik Magdon-Ismail The Learning Problem: 2 /16 The storyline −→

Page 3: Learning From Data Lecture 1 The Learning Problem

The Storyline

1. What is Learning?

2. Can We do it?

3. How to do it?

4. How to do it well?

5. General principles?

6. Advanced techniques.

7. Other Learning Paradigms.

conceptstheorypractice

our language will be mathematics . . .. . . our sword will be computer algorithms

c© AML Creator: Malik Magdon-Ismail The Learning Problem: 3 /16 The applications −→

Page 4: Learning From Data Lecture 1 The Learning Problem

c© AML Creator: Malik Magdon-Ismail The Learning Problem: 4 /16 Define a tree −→

Page 5: Learning From Data Lecture 1 The Learning Problem

Let’s Define a Tree?

c© AML Creator: Malik Magdon-Ismail The Learning Problem: 5 /16 A definition −→

Page 6: Learning From Data Lecture 1 The Learning Problem

Let’s Define a Tree?

A brown trunk moving upwards and branching with leaves . . .

c© AML Creator: Malik Magdon-Ismail The Learning Problem: 6 /16 Does it work? −→

Page 7: Learning From Data Lecture 1 The Learning Problem

Are These Trees?

c© AML Creator: Malik Magdon-Ismail The Learning Problem: 7 /16 Learning a Tree −→

Page 8: Learning From Data Lecture 1 The Learning Problem

Learning “What are Trees” is ‘Easy’

c© AML Creator: Malik Magdon-Ismail The Learning Problem: 8 /16 Recognizing is easy −→

Page 9: Learning From Data Lecture 1 The Learning Problem

Defining is Hard; Recognizing is Easy

Hard to give a complete mathematical definition of a tree.

Even a 3 year old can tell a tree from a non-tree.

The 3 year old has learned from data. (Other tasks like graphics or GAN?)

c© AML Creator: Malik Magdon-Ismail The Learning Problem: 9 /16 Rating movies −→

Page 10: Learning From Data Lecture 1 The Learning Problem

Learning to Rate Movies

• Can we predict how a viewer would rate a movie?

•Why? So that Netflix can make better movie recommendations, and get more rentals.

• $1 million prize for a mere 10% improvement in their recommendation system.

c© AML Creator: Malik Magdon-Ismail The Learning Problem: 10 /16 There’s a pattern, we have data −→

Page 11: Learning From Data Lecture 1 The Learning Problem

Previous Ratings Reflect Future Ratings

movie:

viewer:

likes Tom C

ruise?

likes c

omedy?

likes actio

n?

comedy content

Tom Cruise in it?

action content

blockbuster?

predictedrating

Match corresponding factors

then add their contributions

prefers

blockbusters?

• Viewer taste & movie content imply viewer rating.

• No magical formula to predict viewer rating.

• Netflix has data. We can learn to identify movie

“categories” as well as viewer “preferences”

Class Motto:

A pattern exists. We don’t know it. We have data to learn it.

c© AML Creator: Malik Magdon-Ismail The Learning Problem: 11 /16 Credit approval −→

Page 12: Learning From Data Lecture 1 The Learning Problem

Credit Approval

Let’s use a conceptual example to crystallize the issues.

age 32 years

gender male

salary 40,000

debt 26,000

years in job 1 year

years at home 3 years

. . . . . .

Approve for credit?

c© AML Creator: Malik Magdon-Ismail The Learning Problem: 12 /16 There’s a pattern, we have data −→

Page 13: Learning From Data Lecture 1 The Learning Problem

Credit Approval

Let’s use a conceptual example to crystallize the issues.

• Using salary, debt, years in residence, etc., approve for credit or not.

• No magic credit approval formula.

• Banks have lots of data.

– customer information: salary, debt, etc.

– whether or not they defaulted on their credit.

age 32 years

gender male

salary 40,000

debt 26,000

years in job 1 year

years at home 3 years

. . . . . .

Approve for credit?

A pattern exists. We don’t know it. We have data to learn it.

c© AML Creator: Malik Magdon-Ismail The Learning Problem: 13 /16 Key players −→

Page 14: Learning From Data Lecture 1 The Learning Problem

The Key Players

• Salary, debt, years in residence, . . . input x ∈ Rd = X .

• Approve credit or not output y ∈ {−1,+1} = Y .

• True relationship between x and y target function f : X 7→ Y .

(The target f is unknown.)

• Data on customers data set D = (x1, y1), . . . , (xN , yN).

(yn = f(xn).)

X Y and D are given by the learning problem;

The target f is fixed but unknown.

We learn the function f from the data D.

c© AML Creator: Malik Magdon-Ismail The Learning Problem: 14 /16 Learning −→

Page 15: Learning From Data Lecture 1 The Learning Problem

Learning

• Start with a set of candidate hypotheses H which you think are likely to represent f .

H = {h1, h2, . . . , }

is called the hypothesis set or model.

• Select a hypothesis g from H. The way we do this is called a learning algorithm.

• Use g for new customers. We hope g ≈ f .

X Y and D are given by the learning problem;

The target f is fixed but unknown.

We choose H and the learning algorithm

This is a very general setup (eg. choose H to be all possible hypotheses)

c© AML Creator: Malik Magdon-Ismail The Learning Problem: 15 /16 Summary of learning setup −→

Page 16: Learning From Data Lecture 1 The Learning Problem

Summary of the Learning Setup

(ideal credit approval formula)

(historical records of credit customers)

(set of candidate formulas)

(learned credit approval formula)

UNKNOWN TARGET FUNCTION

f : X 7→ Y

TRAINING EXAMPLES

(x1, y1), (x2, y2), . . . , (xN , yN )

HYPOTHESIS SET

H

FINAL

HYPOTHESIS

g ≈ f

LEARNING

ALGORITHM

A

yn = f(xn)

c© AML Creator: Malik Magdon-Ismail The Learning Problem: 16 /16