25
Supervised Learning & Classification, part I Reading: DH&S, Ch 1

Supervised Learning & Classification, part I Reading: DH&S, Ch 1

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Supervised Learning & Classification, part I Reading: DH&S, Ch 1

Supervised Learning &

Classification, part I

Reading: DH&S, Ch 1

Page 2: Supervised Learning & Classification, part I Reading: DH&S, Ch 1

Administrivia...

•Pretest answers back today

•Today’s lecture notes online after class

•Apple Keynote, PDF, PowerPoint

•PDF & PPT auto-converted; may be flakey

Page 3: Supervised Learning & Classification, part I Reading: DH&S, Ch 1

Your place in history•Yesterday:

•Course administrivia

•Fun & fluffy philosophy

•Today:

•The basic ML problem

•Branches of ML: the 20,000 foot view

•Intro to supervised learning

•Definitions and stuff

Page 4: Supervised Learning & Classification, part I Reading: DH&S, Ch 1

Pretest results: trends

•Courses dominated by math, stat; followed by algorithms; followed by CS530; followed by AI & CS500

•Proficiencies: probability > algorithms > linear algebra

•μ=56%

•σ=28%

Page 5: Supervised Learning & Classification, part I Reading: DH&S, Ch 1

The basic ML problem

“Emphysema”

World

Super

vised

f(⋅)

Page 6: Supervised Learning & Classification, part I Reading: DH&S, Ch 1

•Our job: Reconstruct f() from observations

•Knowing f() tells us:

•Can recognize new (previously unseen) instances

•Classification or discrimination

Hashimoto-Pritzker

The basic ML problem

f(⋅) ???

Page 7: Supervised Learning & Classification, part I Reading: DH&S, Ch 1

•Our job: Reconstruct f() from observations•Knowing f() tells us:•Can synthesize new data (e.g., speech or images)•Generation

The basic ML problem

Randomsource

Emphysema

f(⋅)

Page 8: Supervised Learning & Classification, part I Reading: DH&S, Ch 1

The basic ML problem•Our job: Reconstruct f() from observations

•Knowing f() tells us:

•Can help us understand the process that generated data

•Description or analysis

•Can tell us/find things we never knew

•Discovery or data mining

f(⋅)

How many clusters (“blobs”) are there?Taxonomy of data?Networks of relationships?Unusual/unexpected things?Most important characteristics?

Page 9: Supervised Learning & Classification, part I Reading: DH&S, Ch 1

The basic ML problem•Our job: Reconstruct f() from observations

•Knowing f() tells us:

•Can help us act or perform better

•Control

Turn left?Turn right?Accelerate?Brake?Don’t ride in

the rain?

Page 10: Supervised Learning & Classification, part I Reading: DH&S, Ch 1

A brief taxonomyAll All MLML

(highly

abbreviat

ed)

- have “inputs”- have “outputs”- find “best” f()

- have “inputs”- no “outputs”- find “best” f()

- have “inputs”- have “controls”- have “reward”- find “best” f()

SupervisSuperviseded

UnsuperviUnsupervisedsed

ReinforcemReinforcementent

LearningLearning

Page 11: Supervised Learning & Classification, part I Reading: DH&S, Ch 1

A brief taxonomyAll All MLML

SupervisSuperviseded

UnsuperviUnsupervisedsed

ReinforcemReinforcementent

LearningLearning

(highly

abbreviat

ed)

ClassificatiClassificationon RegressionRegression

Discrete outputs Continuous outputs

Page 12: Supervised Learning & Classification, part I Reading: DH&S, Ch 1

A classic example: digitsThe post office wants to be able to

auto-scanenvelopes, recognize addresses, etc.

87131

???

Page 13: Supervised Learning & Classification, part I Reading: DH&S, Ch 1

Digits to bits

255, 255, 127, 35, 0, 0 ...

255, 0, 93, 11, 45, 6 ...

Feature vectorDigitize (sensors)

Page 14: Supervised Learning & Classification, part I Reading: DH&S, Ch 1

Measurements & features•The collection of numbers from the sensors:

•... is called a feature vector, a.k.a.,

•attribute vector

•measurement vector

•instance

255, 0, 93, 11, 45, 6 ...

Page 15: Supervised Learning & Classification, part I Reading: DH&S, Ch 1

•Written

•where

•d is the dimension of the vector

•Each is drawn from some range

•E.g., or or

Measurements & features

Page 16: Supervised Learning & Classification, part I Reading: DH&S, Ch 1

•Features (attributes, independent variables) can come in different flavors:

•Continuous

•Discrete

•Categorical or nominal

More on features

Page 17: Supervised Learning & Classification, part I Reading: DH&S, Ch 1

•We (almost always) assume that the set of features is fixed & of finite dimension, d

•Sometimes quite large, though (d≥100,000 not uncommon)

•The set of all possible instances is the instance space or feature space,

More on features

Page 18: Supervised Learning & Classification, part I Reading: DH&S, Ch 1

•We (almost always) assume that the set of features is fixed & of finite dimension, d

•Sometimes quite large, though (d≥100,000 not uncommon)

•The set of all possible instances is the instance space or feature space,

More on features

Page 19: Supervised Learning & Classification, part I Reading: DH&S, Ch 1

•Every example comes w/ a class

•A.k.a., label, prediction, dependent variable, etc.

•For classification problems, class label is categorical

•For regression problems, it’s continuous

•Usually called dependent or regressed variable

•We’ll write

•E.g.,

Classes

255, 255, 127, 35, 0, 0 ...

255, 0, 93, 11, 45, 6 ...

“7”

“8”

Page 20: Supervised Learning & Classification, part I Reading: DH&S, Ch 1

Classes, cont’d

•The possible values of the class variable is called the class set, class space, or range

•Book writes indiv classes as

•Presumably whole class set is:

•So

Page 21: Supervised Learning & Classification, part I Reading: DH&S, Ch 1

A very simple example

I. setosa I. versicolor I. virginica

Sepal lengthSepal widthPetal lengthPetal width

Feature space,

Page 22: Supervised Learning & Classification, part I Reading: DH&S, Ch 1

A very simple example

I. setosa I. versicolor I. virginica

Class space,

Page 23: Supervised Learning & Classification, part I Reading: DH&S, Ch 1

Training data•Set of all available data for learning == training data

•A.k.a., parameterization set, fitting set, etc.

•Denoted

•Can write as a matrix, w/ a corresponding class vector:

Page 24: Supervised Learning & Classification, part I Reading: DH&S, Ch 1

Finally, goals•Now that we have and , we have a (mostly) well defined job:

Find the function

that most closely approximates the “true” function

The supervised learning problem:

Page 25: Supervised Learning & Classification, part I Reading: DH&S, Ch 1

Goals?•Key Questions:

•What candidate functions do we consider?

•What does “most closely approximates” mean?

•How do you find the one you’re looking for?

•How do you know you’ve found the “right” one?