View
998
Download
1
Embed Size (px)
Citation preview
Introduction to Pattern Recognition & Machine Learning
Dr Khurram Khurshid
Pattern Recognition
What is a Pattern?“A pattern is the opposite of a chaos; it is an entity vaguelydefined, that could be given a name.”
Machine Perception and Pattern Recognition
Machine Perception - Build a machine that can recognize patterns
Pattern Recognition (PR)– Theory, Algorithms, Systems to Put Patterns into Categories– Classification of Noisy or Complex Data– Relate Perceived Pattern to Previously Perceived Patterns
What is Machine Learning?
Make the machine ‘learn’ some thing
Evaluate how good the machine has ‘learned’
Machine Learning• Sameul programmed a checkers game• Computer played thousands of times against
itself• Over time, it started to learn the patterns
/board positions that lead to wins and those that lead to losses.
• Program learned to play better than Sameul himself
First convincing example that computers can do things other than what they have been programmed to!
Learning Problems – Examples • Learning = Improving with experience over
some task– Improve over task T,– With respect to performance measure P,– Based on experience E.
• Example– T = Play checkers– P = % of games won in a tournament– E = opportunity to play against itself
Machine Learning
• Learning = Improving with experience over some taskA computer program is said to learn from experience E
with respect to some task T and performance measure P, if its performance at task T, as measured by P, improves
with experience E
Learning Problems – Examples • Handwriting recognition
learning problem– Task T: recognizing handwritten
words within images– Performance measure P:
percent of words correctly recognized
– Training experience E: a database of handwritten words with given classifications
Learning Problems – Examples • A robot driving learning problem
– Task T: driving on public four-lane highways using vision sensors
– Performance measure P: average distance traveled before an error (as judged by human overseer)
– Training experience E: a sequence of images and steering commands recorded while observing a human driver
Machine Learning
• Nicolas learns about trucks and combines
Machine learning
• But will he recognize others?
So learning involves ability to generalize from labeled examples
Machine Learning
• There is no need to “learn” to calculate payroll• Learning is used in:
– Data mining programs that learn to detect fraudulent credit card transactions
– Programs that learn to filter spam email– Programs that learn to play checkers/chess– Autonomous vehicles that learn to drive on public
highways– And many more…
Machine learning
Learning associations• Basket analysis:
P (Y | X ) probability that somebody who buys X also buys Y where X and Y are products/services.
Example: P ( chips | coke ) = 0.7
Credit Scoring
• Differentiating between low-risk and high-risk customers from their income and savings
Discriminant: IF income > θ1 AND savings > θ2 THEN low-risk ELSE high-risk
Applications – Pattern Recognition• Speech Recognition• Face Recognition• Handwriting & Character Recognition• Autonomous Driving
Autonomous driving
• ALVINN – Drives 70mph on highways
Face recognition
18
Training examples of a person
Test images
AT&T Laboratories, Cambridge UKhttp://www.uk.research.att.com/facedatabase.html
OCR & Handwriting recognition
Speech recognition
Features
• Features are the individual measurable properties of the
signal being observed.
• The set of features used for learning/recognition is called
feature vector.
• The number of used features is the dimensionality of the
feature vector.
• n-dimensional feature vectors can be represented as
points in n-dimensional feature space
Features
x
2
1
xx
height
weight
Class 1
Class 2
Class 1
Class 2
Feature Extraction• Feature extraction aims to create discriminative
features good for learning• Good Features
– Objects from the same class have similar feature values.
– Objects from different classes have different values.
“Good” features “Bad” features
Contents
• Supervised learning
– Classification
– Regression
• Unsupervised learning
• Reinforcement learning
CLASSIFICATION
Supervised learning - Classification
• Objective– Make Nicolas recognize what is an apple and
what is an orange
Classification
Apples Oranges
Classification
What is this???
Its an apple!!!
• You had some training example or ‘training data’
• The examples were ‘labeled’
• You used those examples to make the kid ‘learn’ the difference between an apple and an orange
Classification
Given: training images and their categories What are the categories of these test images?
Apple
Pear
Tomato
Cow
Dog
Horse
Classification
• Cancer Diagnosis – Generally more than one variables
Tumor Size
MalignantBenign
Age
Why supervised – The algorithm is given a number of patients with the RIGHT ANSWER and we want the algorithm to learn to predict for new patients
Classification
• Cancer Diagnosis – Generally more than one variables
Tumor Size
MalignantBenign
Age
We want the algorithm to learn the separation line. Once a new patient arrives with a given age and tumor size – Predict as Malignant or Benign
Predict for this patient
Supervised Learning - Example
Cancer diagnosis – Many more features
Use this training set to learn how to classify patients where diagnosis is not known:
Input Data Classification
Patient ID # of Tumors Avg Area Avg Density Diagnosis1 5 20 118 Malignant2 3 15 130 Benign3 7 10 52 Benign4 2 30 100 Malignant
Patient ID # of Tumors Avg Area Avg Density Diagnosis101 4 16 95 ?102 9 22 125 ?103 1 14 80 ?
REGRESSION
Regression
The variable we are trying to predict is DISCRETE
The variable we are trying to predict is CONTINUOUS
CLASSIFICATION
REGRESSION
Regression
• Dataset giving the living areas and prices of 50 houses
Regression
• We can plot this data
Given data like this, how can we learn to predict the prices of other houses as a function of the size of their living areas?
Regression• The “input” variables – x(i) (living area in this
example)• The “output” or target variable that we are
trying to predict – y(i) (price)• A pair (x(i), y(i)) is called a training example• A list of m training examples {(x(i), y(i)); i =• 1, . . . ,m}—is called a training set• X denote the space of input values, and Y the
space of output values
RegressionGiven a training set, to learn a function h : X → Y so that h(x) is a “good” predictor for the corresponding value of y. For historical reasons, this function h is called a hypothesis.
Regression
Regression
• Example: Price of a used car
• x : car attributesy : price
Contents
• Supervised learning
– Classification
– Regression
• Unsupervised learning
• Reinforcement learning
CLUSTERING
UNSUPERVISED LEARNING
• CLUSTERING
There are two types of fruit in the basket, separate them into two ‘groups’
UNSUPERVISED LEARNING
• CLUSTERING The data was not ‘labeled’ you did
not tell Nicolas which are apples which are oranges
May be the kid used the idea that things in the same group should be similar to one another as compared to things in the other group
Groups - Clusters
Separate groups or clusters
Clustering
Tumor Size
Age
We have the data for patients but NOT the RIGHT ANSWERS. The objective is to find interesting structures in data (in this case two clusters)
Unsupervised Learning – Cocktail Party Effect
• Speakers recorded speaking simultaneously
Unsupervised Learning – Cocktail Party Effect
• Source Separation• Data can be explained by two different
speakers speaking – ICA algorithm
Source: http://cnl.salk.edu/~tewon/Blind/blind_audio.html
Classification vs Clustering
• Challenges– Intra-class variability– Inter-class similarity
Intra class variability
The letter “T” in different typefaces
Same face under different expression, pose, illumination
Inter class similarity
Characters that look similar
Identical twins
Contents
• Supervised learning
– Classification
– Regression
• Unsupervised learning
• Reinforcement learning
Reinforcement Learning
• In RL, the computer is simply given a goal to achieve.
• The computer then learns how to achieve that goal by trial-and-error interactions with its environment
System learns from success and failure, reward and punishment
Reinforcement Learning
• Objective: Fly the helicopter• Need to make a sequence of good decisions to make it fly• Similar to training a pet dog
Every time dog does something good
you pat him and say ‘good dog’
Every time dog does some thing bad
you scold him saying ‘bad dog’
Over time dog will learn to do good
things
Task …• Separation of different coins using a robotic arm
05/01/23
A Fancy problem
Sorting incoming fish on a conveyor according to species (salmon or sea bass) using optical sensing
Salmon or sea bass?(2 categories or classes)
It is a classification problem. How to solve it?
Approach
Data Collection: Take some images using
optical sensor
Approach
• Data collectionHow to use it?
• Preprocessing: Use a segmentation operation to isolate fishes from one another and from the backgroundImage processing?
• Information from a single fish is sent to a feature extractor whose purpose is to reduce the data by measuring certain featuresBut which features to extract?
• The features are passed to a classifier that evaluates the evidence and then takes a final decisionHow to design and realize a classifier?
Approach• Set up a camera and take some sample
images to extract features – Length– Lightness– Width– Number and shape of fins– Position of the mouth, etc…
• This is the set of all suggested features to explore for use in our classifier!
• Challenges:– Variations in images – lightning, occlusion, camera view angle– Position of the fish on the conveyer belt, etc…
How data is collected & used• Data can be raw signals (e.g. images) or features extracted from images – data is
usually expensive
• The data is divided into three parts (exact percentage of each portion depends (partially) on data sample size)
• Train data: It is used to build a prediction model or learner (classifier)
• Validation data: It is used to estimate the prediction error (classification error) and adjust the learner parameters
• Test data: It is used to estimate the classification error of the chosen learner on unseen data called generalization error. The test must be kept inside a ‘vault’ and be brought out only at the end of data analysis
Train Validation Test
Preprocessing• If data is an image then apply image processing• What is an image?
– A gray scale image z = f(x,y) is composed of pixels where x & y are the location of the pixel and z is its intensity
– Image can be considered just a matrix of certain dimensions
11 1
1
An
m mn
a a
a a
Divided into 8x8 blocks
Preprocessing
• Examples of image processing operations• Filtering: It is used for enhancing the image or removing noise from image
• Thresholding: Segment object from background
• and many more …
Filtering
Thresholding
Feature extraction• Feature extraction: use domain knowledge
– The sea bass is generally longer than a salmon– The average lightness of sea bass scales is greater
than that of salmon
• We will used training data in order to learn a classification rule based on these features (length of a fish and average lightness)
• Length of fish and average lightness may not be sufficient features i.e. they may not guarantee 100% classification results
Classification – Option 1• Select the length of the fish as a possible feature for
discrimination between two classes
Decision Boundary
Histograms for the length feature for the two categories
Cost of Taking a Decision• A fish-packaging industry use the system to
pack fish in cans.• Two facts
– People do not want to find sea bass in the cans labeled salmon
– People occasionally accepts to find salmon in the cans labeled sea-bass
• So the cost of taking a decision in favor of sea bass when the true reality is salmon is not the same as the converse
Evaluation of a classifier• How to evaluate a certain classifier?
• Classification error: The percentage of patterns (e.g. fish) that are assigned to wrong category
– Choose a classifier that gives minimum classification
error
• Risk is the total expected cost of decisions
– Choose a classifier that minimizes the risk
Classification – option 2• Select the average lightness of the fish as a possible
feature for discrimination between two classes
Histograms for the average lightness feature for the two categories
Classification – option 3
• Use both length and average lightness features for classification. Use a simple line to discriminate
1 2x xx
Decision Boundary
The two features of lightness and width for sea bass and salmon. The dark line might serve as a decision boundary
of our classifier
Classification – option 3• Use both length and average lightness features for
classification. Use a complex model to discriminate
68Overly complex models for the fish will lead to decision
boundaries that are complicated. While such a decision may lead to perfect classification (classification error is zero) of our training samples, it would lead to poor performance on
future patterns (generalization is poor) overfitting
Comments
• Model selection– A complex model seems not be correct one. It is
learning the training data by heart. – So how to choose correct model? (a difficult
question)– “simpler models should be preferred over complex
ones”
• Generalization error– The minimization of classification error on train
database does not guarantee minimization of classification error on test database (generalization error)
Classification – Option 3
• Decision boundary with good generalization
The decision boundary shown might represent the optimal tradeoff between performance on the training
set and simplicity of classifier.
Components of a typical pattern recognition system
• Sensing– Use of a sensor (camera or microphone)
• Segmentation– Patterns should be well separated and should not
overlap
• Feature extraction– Discriminative features– Invariant features with respect to translation, rotation
and scale– Challenges
• Occlusions• Deformations
Components of a typical pattern recognition system
• Classification– Use a feature vector provided by a feature extractor to
assign the object to a category
– The classifier recommends actions (e.g. put this fish in this
bucket, put that fish in that bucket)
– This stage may employ single or multiple classifiers
• Post Processing– The post-processor uses the output of the classifier to
decide on the recommended action
– Exploit context input dependent information other than
from the target pattern itself to improve performance 72