Slide 1
Learning?
What can we learn from here?If Sky=Sunny and Air Temperature = Warm Enjoy Sport = YesIf Sky=Sunny Enjoy Sport = YesIf Air Temperature = Warm Enjoy Sport = YesIf Sky=Sunny and Air Temperature = Warm and Wind = Strong
Enjoy Sport = Yes ??
Example Sky Air Temp Humidity Wind Water Forecast Enjoy Sport
1 Sunny Warm Normal Strong Warm Same Yes2 Sunny Warm High Strong Warm Same Yes3 Rainy Cold High Strong Warm Change No4 Sunny Warm High Strong Cold Change Yes
Slide 1
What is machine learning?
(H.Simon)“Any process by which a system improves performance”
(T.Mitchell)“A computer program is said to learn from experience E with
respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.”
• Machine Learning has to do with designing computer programs that improve their performance through experience
Slide 1
Related areas
Artificial intelligenceProbability and statisticsComputational complexity theoryInformation theoryHuman language technology
Slide 1
Applications of ML
Learning to recognize spoken wordsSPHINX (Lee 1989)
Learning to drive an autonomous vehicleALVINN (Pomerleau 1989)
Learning to classify celestial objects(Fayyad et al 1995)
Learning to play world-class backgammonTD-GAMMON (Tesauro 1992)
Learning to translate between languagesLearning to classify texts into categories
Web directories
Slide 1
Main directions in ML
Data miningFinding patterns in dataUse “historical” data to make a decision
Predict weather based on current conditionsSelf customization
Automatic feedback integrationAdapt to user “behaviour”
Recommending systemsWriting applications that cannot be programmed by hand
In particular because they involve huge amounts of dataSpeech recognitionHand writing recognitionText understanding
Slide 1
Terminology
Learning is performed from EXAMPLES (or INSTANCES)An example contains ATTRIBUTES or FEATURES
E.g. Sky, Air Temperature, WaterIn concept learning, we want to learn the value of the
TARGET ATTRIBUTE Classification problems. Binary case +/– positive/negative
Attributes have VALUES:A single value (e.g. Warm)? - indicates any value possible for this attribute - indicates that no value is acceptable.
All features in an example are sometimes referred to as FEATURE VECTOR
Slide 1
Terminology
Feature vector for our learning problem:(Sky, Air Temp, Humidity, Wind, Water, Forecast) and the
target attribute is EnjoySport.How to represent Aldo enjoys sports only on cold days
with high humidity(?, Cold, High, ?, ?, ?)
How about Emma enjoys sports regardless of the weather?
Hypothesis = the entire set of vectors that cover given examples
Most general hypothesis(?, ?, ?, ?, ?, ?)
Most specific hypothesis(, , , , , )
How many hypothesis can be generated for our feature vector ?
Slide 1
Task in machine learning
Given:A set of examples XA set of hypotheses HA target concept c
Determine:A hypothesis h in H such that h(x) = c(x)
Practically, we want to determine those hypotheses that would best fit our examples.(Sunny, ?, ?, ?, ?, ?) Yes(?, Warm, ?, ?, ?, ?) Yes(Sunny, Warm, ?, ?, ?, ?) Yes
Slide 1
Machine learning applications
Until now: toy example, decide if X enjoys sport given the current and future forecast
Practical problems:Part of speech tagging. How?Word sense disambiguationText categorizationChunking..
Whatever problem that can be modeled through examples should support learning
Slide 1
Machine learning algorithms
Concept learning via searching on general-specific hypotheses
Decision tree learningInstance based learningRule based learningNeural networksBayesian learningGenetic algorithms
Slide 1
Basic elements of information theory
How to determine which attribute is the best classifier?Measure the information gain of each attribute
Entropy characterizes the (im)purity of an arbitrary collection of examples. Given a collection S of positive and negative examplesEntropy(S) = - p log p – q log q Entropy is at its maximum when p = q = ½Entropy is at its minimum when p = 1 and q = 0
Example:S contains 14 examples: 9 positive and 5 negativeEntropy(S) = - (9/14) log (9/14) – (5/14) log (5/14) = 0.94
log 0 = 0
Slide 1
Basic elements of information theory
Information gainMeasures the expected reduction in entropy
Many learning algorithms are making decisions based on information gain
)(||
||)(),(
)(v
AValuesv
v SEntropyS
SSEntropyASGain
Slide 1
Decision trees
Have the capability of generating rules:IF outlook=sunny and temperature = hot THEN play tennis = no
Powerful! It would be very hard to do that as a human.C4.5 (Quinlan)ID3Integral part of MLC++Integral part of Weka (for Java)
Slide 1
Instance based algorithms
Distance between examplesRemember the WSD algorithm?
K-nearest neighbourGiven a set of examples X
(a1(x), a2(x) … an(x))Classify a new instance based on the distance between
current example and all examples in training
n
rjrirji xaxaxxd
1
2))()((),(
Slide 1
Instance based algorithms
Take into account every single example:Advantage? Disadvantage?
“Do not forget exceptions”Very good for NLP tasks:
WSDPOS tagging
Slide 1
Measure learning performance
Error on test dataSample error (generalization error): wrong cases / total casesTrue error: estimate an error range starting with the sample
error
Cross validation schemes – for more accurate evaluations10 fold cross validation schemeDivide training data into 10 setsUse one set for testing, and the other 9 sets for trainingRepeat 10 times, measure average accuracy
Slide 1
Practical issues – Using Weka
Weka – freewareJava implementation of many learning algorithms+ boosting+ capability of handling very large data sets+ automatic cross – validation
To run an experiment:file.arff [test optional – if not present, will evaluate through cross-
validation]
Slide 1
Specify the feature types
Specify the feature types:Discrete: value drawn from a set of nominal valuesContinuous: numeric value
Example : Golf dataPlay, Don't Play. | the target attribute
outlook: sunny, overcast, rain. | features.temperature: real.humidity: real.windy: true, false.
Slide 1
Weather Data
sunny, 85, 85, false, Don't Playsunny, 80, 90, true, Don't Playovercast, 83, 78, false, Playrain, 70, 96, false, Playrain, 68, 80, false, Playrain, 65, 70, true, Don't Playovercast, 64, 65, true, Playsunny, 72, 95, false, Don't Playsunny, 69, 70, false, Playrain, 75, 80, false, Playsunny, 75, 70, true, Playovercast, 72, 90, true, Playovercast, 81, 75, false, Playrain, 71, 80, true, Don't Play