Upload
doreen-warner
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
Overview of Today’s Overview of Today’s LectureLecture
• Last Time: course introductionLast Time: course introduction• Reading assignment posted to class webpageReading assignment posted to class webpage• Don’t get discouragedDon’t get discouraged
• Today: introduction to “Supervised Machine Today: introduction to “Supervised Machine Learning”Learning”• Our first ML algorithm: K-nearest neighborOur first ML algorithm: K-nearest neighbor
• HW 0 out onlineHW 0 out online• Create a dataset of Create a dataset of
• ““fixed-length feature vectors”fixed-length feature vectors”• Due next Tuesday Sept 19 (4 PM)Due next Tuesday Sept 19 (4 PM)• Instructions for handing in HW0 coming soonInstructions for handing in HW0 coming soon
Supervised Learning: Supervised Learning: OverviewOverview
Real World Digital Representation(feature space)
select features
constructclassifier
classificationrules
If feature 2 = X then APPLY BREAK = TRUE
HW 0 HW 1-2
humans machine
Supervised Learning: Supervised Learning: Task DefinitionTask Definition• GivenGiven
• A collection of A collection of positivepositive examples of some examples of some concept/class/category (i.e., members of the class) concept/class/category (i.e., members of the class) and, possibly, a collection of the and, possibly, a collection of the negativenegative examples examples (i.e., non-members)(i.e., non-members)
• ProduceProduce• A description that A description that coverscovers (includes) all (most) of the (includes) all (most) of the
positive examples and none (few) of the negative positive examples and none (few) of the negative examples examples
(which, hopefully, properly categorizes most future (which, hopefully, properly categorizes most future examples!)examples!)
The KeyPoint!
Note: one can easily extend this definition Note: one can easily extend this definition to handle more than two classesto handle more than two classes
ExampleExamplePositive Examples Negative Examples
How does this symbol classify?
•Concept
•Solid Red Circle in a Regular Polygon
•What about?• Figure with red solid circles not in larger red circle• Figures on left side of page etc
HW0 – Your “Personal HW0 – Your “Personal Concept”Concept”
• Step 1: Step 1: Choose a Boolean (true/false) conceptChoose a Boolean (true/false) concept• Subjective judgment (can’t articulate)Subjective judgment (can’t articulate)
• Books I like/dislike Books I like/dislike • Movies I like/dislike Movies I like/dislike • www pages I like/dislikewww pages I like/dislike
• ““time will tell” conceptstime will tell” concepts• Stocks to buyStocks to buy• Medical treatmentMedical treatment ( (at time at time tt, predict outcome at , predict outcome at
time (time (t t ++∆∆tt))))• Sensory interpretation Sensory interpretation
• Face recognition (See text)Face recognition (See text)• Handwritten digit recognitionHandwritten digit recognition• Sound recognitionSound recognition
• Hard to program functionsHard to program functions
HW0 – Your “Personal HW0 – Your “Personal Concept”Concept”
• Step 2: Step 2: Choose a feature spaceChoose a feature space• We will use fixed-length feature vectorsWe will use fixed-length feature vectors
• Choose Choose NN features features• Each feature has Each feature has VVii
possible valuespossible values• Each example is represented by a vector of N feature Each example is represented by a vector of N feature
values values (i.e., (i.e., is a point in the feature spaceis a point in the feature space))e.g.: e.g.: <red, 50, round><red, 50, round>
colorcolor weight shapeweight shape
• Feature TypesFeature Types• BooleanBoolean• NominalNominal• OrderedOrdered• HierarchicalHierarchical
• Step 3: Step 3: Collect examples (“I/O” pairs)Collect examples (“I/O” pairs)
Defines a space
We will not use hierarchical features
Standard Feature TypesStandard Feature Typesfor representing training examples for representing training examples – source of “ – source of “domain knowledgedomain knowledge””
closed
polygon continuous
trianglesquare circle ellipse
• Nominal (Boolean is a special case)Nominal (Boolean is a special case)• No relationship among possible valuesNo relationship among possible values
e.g., e.g., color color єє {red, blue, green} {red, blue, green} (vs.(vs. color = 1000 color = 1000 Hertz)Hertz)
• Linear (or Ordered)Linear (or Ordered)• Possible values of the feature are totally Possible values of the feature are totally
orderedorderede.g., e.g., size size єє {small, medium, large} {small, medium, large} ←← discretediscrete
weight weight єє [0…500] [0…500] ←← continuouscontinuous
• HierarchicalHierarchical• Possible values are Possible values are partiallypartially
ordered in an ISA hierarchyordered in an ISA hierarchye.g. for e.g. for shapeshape ->->
Example Hierarchy Example Hierarchy (KDD* Journal, Vol 5, No. 1-2, 2001, page 17)(KDD* Journal, Vol 5, No. 1-2, 2001, page 17)
Product
Pet Foods
Tea
Canned Cat Food
Dried Cat Food
99 Product Classes
2302 Product Subclasses
Friskies Liver, 250g
~30k Products• Structure of one feature!
• “the need to be able to incorporate hierarchical (knowledge about data types) is shown in every paper.”
- From eds. Intro to special issue (on applications) of KDD journal, Vol 15, 2001
* Officially, “Data Mining and Knowledge Discovery”, Kluwer Publishers
Some Famous ExamplesSome Famous Examples
• Car Steering (Pomerleau)Car Steering (Pomerleau)
• Medical Diagnosis (Quinlan)Medical Diagnosis (Quinlan)
• DNA CategorizationDNA Categorization• TV-pilot ratingTV-pilot rating• Chemical-plant controlChemical-plant control• Back gammon playingBack gammon playing• WWW page scoringWWW page scoring• Credit application scoringCredit application scoring
Learned Function
Steering Angle
Digitized camera image
age = 13sex = M wgt = 18
Learned Function
ill vs
healthy
Medicalrecord
HW0: Creating your datasetHW0: Creating your dataset
1.1. Choose a datasetChoose a dataset• based on interest/familiaritybased on interest/familiarity• meets basic requirementsmeets basic requirements
• >1000 examples>1000 examples• category (function) learned category (function) learned
should be binary valuedshould be binary valued• ~500 “true” and “false” ~500 “true” and “false”
examplesexamples
→→ Internet Movie Database (IMDb)Internet Movie Database (IMDb)
Example Database: IMDbExample Database: IMDb
Studio
Movie
Director/Producer
Actor
Made Acted inDirected
•Name•Country•Movies
•Name•Year of birth•Gender•Oscars•Movies
•Title•Genre•Year•Opening Weekend•BO receipts•List of actors/actresses•Release season
•Name•Year of birth•Movies
Produced
HW0: Creating your datasetHW0: Creating your dataset
Choose Boolean target function Choose Boolean target function (category)(category)
• Some examples:Some examples:• Opening weekend box office receipts > Opening weekend box office receipts >
$2 million$2 million• Movie is drama? (action, sci-fi,…)Movie is drama? (action, sci-fi,…)• Movies I like/dislike (e.g. Tivo)Movies I like/dislike (e.g. Tivo)
HW0: Creating your datasetHW0: Creating your dataset
• MovieMovie• Average age of actorsAverage age of actors• Number of producersNumber of producers• Percent female actorsPercent female actors
• StudioStudio• Number of movies Number of movies
mademade• Average movie grossAverage movie gross• Percent movies Percent movies
released in USreleased in US
• Director/ProducerDirector/Producer• Years of experienceYears of experience• Most prevalent Most prevalent
genregenre• Number of award Number of award
winning movieswinning movies• Average movie Average movie
grossgross• ActorActor
• GenderGender• Has previous Oscar Has previous Oscar
award or award or nominationsnominations
• Most prevalent Most prevalent genregenre
Create your feature spaceCreate your feature space
HW0: Creating your datasetHW0: Creating your dataset
David Jensen’s group at UMass used Naïve Bayes David Jensen’s group at UMass used Naïve Bayes (NB) to predict the following based on attributes they (NB) to predict the following based on attributes they selected and a novel way of sampling from the data:selected and a novel way of sampling from the data:
• Opening weekend box office receipts > $2 Opening weekend box office receipts > $2 millionmillion• 25 attributes25 attributes• Accuracy = 83.3%Accuracy = 83.3%• Default accuracy = 56%Default accuracy = 56%
• Movie is drama?Movie is drama?• 12 attributes12 attributes• Accuracy = 71.9%Accuracy = 71.9%• Default accuracy = 51%Default accuracy = 51%
• http://kdl.cs.umass.edu/proximity/about.htmlhttp://kdl.cs.umass.edu/proximity/about.html
.
.
.
Back to Supervised Back to Supervised Learning Learning One way learning systems differ is in how they One way learning systems differ is in how they representrepresent concepts: concepts:
TrainingExamples
Backpropagation
C4.5, CART
AQ, FOIL
SVMs
NeuralNet
DecisionTree
Φ <- X^YΦ <- Z
Rules
If 5x1 + 9x2 – 3x3 > 12Then +
Feature SpaceFeature Space
If examples are described in terms of If examples are described in terms of values of features, they can be plotted values of features, they can be plotted as points in an N-dimensional space.as points in an N-dimensional space.
Size
Color
Weight
?Big
2500
Gray
A “concept” is then a (possibly disjoint) volume in this space.
Supervised Learning = Supervised Learning = Learning from Labeled Learning from Labeled ExamplesExamples• Most common & successful form Most common & successful form
of MLof ML Venn Diagram
+ ++
+
- -
--
-
-
--
• Examples – points in multi-dimensional “feature space”• Concepts – “function” that labels points in feature space
(as +, -, and possibly ?)
Brief ReviewBrief Review
• Conjunctive ConceptConjunctive Concept• Color(?obj1, red)Color(?obj1, red)
^̂• Size(?obj1, large)Size(?obj1, large)
• Disjunctive ConceptDisjunctive Concept• Color(?obj2, blue)Color(?obj2, blue)
vv• Size(?obj2, small)Size(?obj2, small)A A A
“and”
“or”
Instances
Empirical Learning and Empirical Learning and Venn DiagramsVenn Diagrams
Concept = Concept = AA or or B B (Disjunctive concept)(Disjunctive concept)
Examples = labeled points in feature spaceExamples = labeled points in feature space
Concept = a label for a Concept = a label for a setset of points of points
Venn Diagram
A
B
--
--
-
-
- -
-
-
-
-
--
-
-
-
--
--
- - --- -
---
--
-
-
+
++ ++
+ +
+
++
+ +
+
++
+
+
++
Feature Space
Aspects of an ML SystemAspects of an ML System
• ““Language” for representing examplesLanguage” for representing examples• ““Language” for representing “Concepts”Language” for representing “Concepts”• Technique for producing concept Technique for producing concept
“consistent” with the training examples“consistent” with the training examples• Technique for classifying new instanceTechnique for classifying new instance
Each of these limits the Each of these limits the expressivenessexpressiveness//efficiencyefficiency of the supervised learning algorithm.of the supervised learning algorithm.
HW 0
OtherHW’s
Nearest-Neighbor Nearest-Neighbor AlgorithmsAlgorithms(aka. Exemplar models, instance-based learning (aka. Exemplar models, instance-based learning
(IBL), case-based learning)(IBL), case-based learning)
• Learning ≈ memorize training examplesLearning ≈ memorize training examples• Problem solving = find most similar Problem solving = find most similar
example in memory; output its categoryexample in memory; output its categoryVenn
-
--
-
-
--
-+
+
+
+ + +
++
+
+?
…“Voronoi
Diagrams”(pg 233)
Sample Experimental Sample Experimental ResultsResults
TestbedTestbed Testset CorrectnessTestset Correctness
IBLIBL D-TreesD-Trees Neural NetsNeural Nets
Wisconsin Wisconsin CancerCancer
98%98% 95%95% 96%96%
Heart Heart DiseaseDisease
78%78% 76%76% ??
TumorTumor 37%37% 38?38? ??
AppendicitisAppendicitis 83%83% 85%85% 86%86%
Simple algorithm works quite well!
““Hamming Distance”Hamming Distance”•Ex 1 = 2Ex 1 = 2•Ex 2 = 1Ex 2 = 1•Ex 3 = 2Ex 3 = 2
Simple Example – 1-NNSimple Example – 1-NN
Training SetTraining Set1.1. a=0, b=0, c=1a=0, b=0, c=1 ++2.2. a=0, b=1, c=0a=0, b=1, c=0 --3.3. a=1, b=1, c=1a=1, b=1, c=1 --Test ExampleTest Example• a=0, b=1, c=0 a=0, b=1, c=0 ??
So output -
(1-NN ≡(1-NN ≡ one nearest neighbor)one nearest neighbor)
K-NN AlgorithmK-NN Algorithm
Collect K nearest neighbors, select majority Collect K nearest neighbors, select majority classification (or somehow combine their classification (or somehow combine their classes)classes)
• What should K be?What should K be?• It probability is problem dependentIt probability is problem dependent• Can use Can use tuning setstuning sets (later) to select (later) to select
a good setting for Ka good setting for KTuning SetError Rate
1 2 3 4 5 K
Shouldn’t really“connect the dots”(Why?)
Some Common JargonSome Common Jargon
• ClassificationClassification• Learning a Learning a discretediscrete valued function valued function
• RegressionRegression• Learning a Learning a realreal valued function valued function
IBL easily extended to regression tasks IBL easily extended to regression tasks (and to multi-category classification)(and to multi-category classification)
Discrete/RealOutputs
Variations on a ThemeVariations on a Theme
• IB1IB1 – keep all examples – keep all examples
• IB2IB2 – keep next instance if – keep next instance if incorrectlyincorrectly classified by using previous instancesclassified by using previous instances• Uses less storageUses less storage• Order dependentOrder dependent• Sensitive to noisy dataSensitive to noisy data
(From Aha, Kibler and Albert in ML Journal)(From Aha, Kibler and Albert in ML Journal)
Variations on a Theme Variations on a Theme (cont.)(cont.)• IB3IB3 – extend IB2 to more intelligently decide – extend IB2 to more intelligently decide
which examples to keep (see article)which examples to keep (see article)• Better handling of noisy dataBetter handling of noisy data
• Another IdeaAnother Idea - - cluster groups, keep cluster groups, keep “examples” from each (median/centroid)“examples” from each (median/centroid)