72
Introduction to Pattern Recognition & Machine Learning Dr Khurram Khurshid Pattern Recognition

01 pattern recognition

Embed Size (px)

Citation preview

Page 1: 01 pattern recognition

Introduction to Pattern Recognition & Machine Learning

Dr Khurram Khurshid

Pattern Recognition

Page 2: 01 pattern recognition

What is a Pattern?“A pattern is the opposite of a chaos; it is an entity vaguelydefined, that could be given a name.”

Page 3: 01 pattern recognition

Machine Perception and Pattern Recognition

Machine Perception - Build a machine that can recognize patterns

Pattern Recognition (PR)– Theory, Algorithms, Systems to Put Patterns into Categories– Classification of Noisy or Complex Data– Relate Perceived Pattern to Previously Perceived Patterns

Machine perception is the capability of a computer system to interpret data in a manner that is similar to the way humans use their senses to relate to the world around them
Pattern recognition is a branch of machine learning that focuses on the recognition of patterns and regularities in data, although it is in some cases considered to be nearly synonymous with machine learning. Pattern recognition systems are in many cases trained from labeled "training" data (supervised learning), but when no labeled data are available other algorithms can be used to discover previously unknown patterns (unsupervised learning).
Page 4: 01 pattern recognition

What is Machine Learning?

Make the machine ‘learn’ some thing

Evaluate how good the machine has ‘learned’

Machine learning is a type of artificial intelligence (AI) that provides computers with the ability to learn without being explicitly programmed. Machine learning focuses on the development of computer programs that can teach themselves to grow and change when exposed to new data
Page 5: 01 pattern recognition

Machine Learning• Sameul programmed a checkers game• Computer played thousands of times against

itself• Over time, it started to learn the patterns

/board positions that lead to wins and those that lead to losses.

• Program learned to play better than Sameul himself

First convincing example that computers can do things other than what they have been programmed to!

Page 6: 01 pattern recognition

Learning Problems – Examples • Learning = Improving with experience over

some task– Improve over task T,– With respect to performance measure P,– Based on experience E.

• Example– T = Play checkers– P = % of games won in a tournament– E = opportunity to play against itself

Page 7: 01 pattern recognition

Machine Learning

• Learning = Improving with experience over some taskA computer program is said to learn from experience E

with respect to some task T and performance measure P, if its performance at task T, as measured by P, improves

with experience E

Page 8: 01 pattern recognition

Learning Problems – Examples • Handwriting recognition

learning problem– Task T: recognizing handwritten

words within images– Performance measure P:

percent of words correctly recognized

– Training experience E: a database of handwritten words with given classifications

Page 9: 01 pattern recognition

Learning Problems – Examples • A robot driving learning problem

– Task T: driving on public four-lane highways using vision sensors

– Performance measure P: average distance traveled before an error (as judged by human overseer)

– Training experience E: a sequence of images and steering commands recorded while observing a human driver

Page 10: 01 pattern recognition

Machine Learning

• Nicolas learns about trucks and combines

Page 11: 01 pattern recognition

Machine learning

• But will he recognize others?

So learning involves ability to generalize from labeled examples

Page 12: 01 pattern recognition

Machine Learning

• There is no need to “learn” to calculate payroll• Learning is used in:

– Data mining programs that learn to detect fraudulent credit card transactions

– Programs that learn to filter spam email– Programs that learn to play checkers/chess– Autonomous vehicles that learn to drive on public

highways– And many more…

Page 13: 01 pattern recognition

Machine learning

Page 14: 01 pattern recognition

Learning associations• Basket analysis:

P (Y | X ) probability that somebody who buys X also buys Y where X and Y are products/services.

Example: P ( chips | coke ) = 0.7

Page 15: 01 pattern recognition

Credit Scoring

• Differentiating between low-risk and high-risk customers from their income and savings

Discriminant: IF income > θ1 AND savings > θ2 THEN low-risk ELSE high-risk

Page 16: 01 pattern recognition

Applications – Pattern Recognition• Speech Recognition• Face Recognition• Handwriting & Character Recognition• Autonomous Driving

Page 17: 01 pattern recognition

Autonomous driving

• ALVINN – Drives 70mph on highways

Page 18: 01 pattern recognition

Face recognition

18

Training examples of a person

Test images

AT&T Laboratories, Cambridge UKhttp://www.uk.research.att.com/facedatabase.html

Page 19: 01 pattern recognition

OCR & Handwriting recognition

Page 20: 01 pattern recognition

Speech recognition

Page 21: 01 pattern recognition

Features

• Features are the individual measurable properties of the

signal being observed.

• The set of features used for learning/recognition is called

feature vector.

• The number of used features is the dimensionality of the

feature vector.

• n-dimensional feature vectors can be represented as

points in n-dimensional feature space

Page 22: 01 pattern recognition

Features

x

2

1

xx

height

weight

Class 1

Class 2

Class 1

Class 2

Page 23: 01 pattern recognition

Feature Extraction• Feature extraction aims to create discriminative

features good for learning• Good Features

– Objects from the same class have similar feature values.

– Objects from different classes have different values.

“Good” features “Bad” features

Page 24: 01 pattern recognition

Contents

• Supervised learning

– Classification

– Regression

• Unsupervised learning

• Reinforcement learning

Page 25: 01 pattern recognition

CLASSIFICATION

Page 26: 01 pattern recognition

Supervised learning - Classification

• Objective– Make Nicolas recognize what is an apple and

what is an orange

Page 27: 01 pattern recognition

Classification

Apples Oranges

Page 28: 01 pattern recognition

Classification

What is this???

Its an apple!!!

• You had some training example or ‘training data’

• The examples were ‘labeled’

• You used those examples to make the kid ‘learn’ the difference between an apple and an orange

Page 29: 01 pattern recognition

Classification

Given: training images and their categories What are the categories of these test images?

Apple

Pear

Tomato

Cow

Dog

Horse

Page 30: 01 pattern recognition

Classification

• Cancer Diagnosis – Generally more than one variables

Tumor Size

MalignantBenign

Age

Why supervised – The algorithm is given a number of patients with the RIGHT ANSWER and we want the algorithm to learn to predict for new patients

Page 31: 01 pattern recognition

Classification

• Cancer Diagnosis – Generally more than one variables

Tumor Size

MalignantBenign

Age

We want the algorithm to learn the separation line. Once a new patient arrives with a given age and tumor size – Predict as Malignant or Benign

Predict for this patient

Page 32: 01 pattern recognition

Supervised Learning - Example

Cancer diagnosis – Many more features

Use this training set to learn how to classify patients where diagnosis is not known:

Input Data Classification

Patient ID # of Tumors Avg Area Avg Density Diagnosis1 5 20 118 Malignant2 3 15 130 Benign3 7 10 52 Benign4 2 30 100 Malignant

Patient ID # of Tumors Avg Area Avg Density Diagnosis101 4 16 95 ?102 9 22 125 ?103 1 14 80 ?

Page 33: 01 pattern recognition

REGRESSION

Page 34: 01 pattern recognition

Regression

The variable we are trying to predict is DISCRETE

The variable we are trying to predict is CONTINUOUS

CLASSIFICATION

REGRESSION

Page 35: 01 pattern recognition

Regression

• Dataset giving the living areas and prices of 50 houses

Page 36: 01 pattern recognition

Regression

• We can plot this data

Given data like this, how can we learn to predict the prices of other houses as a function of the size of their living areas?

Page 37: 01 pattern recognition

Regression• The “input” variables – x(i) (living area in this

example)• The “output” or target variable that we are

trying to predict – y(i) (price)• A pair (x(i), y(i)) is called a training example• A list of m training examples {(x(i), y(i)); i =• 1, . . . ,m}—is called a training set• X denote the space of input values, and Y the

space of output values

Page 38: 01 pattern recognition

RegressionGiven a training set, to learn a function h : X → Y so that h(x) is a “good” predictor for the corresponding value of y. For historical reasons, this function h is called a hypothesis.

Page 39: 01 pattern recognition

Regression

Page 40: 01 pattern recognition

Regression

• Example: Price of a used car

• x : car attributesy : price

Page 41: 01 pattern recognition

Contents

• Supervised learning

– Classification

– Regression

• Unsupervised learning

• Reinforcement learning

Page 42: 01 pattern recognition

CLUSTERING

Page 43: 01 pattern recognition

UNSUPERVISED LEARNING

• CLUSTERING

There are two types of fruit in the basket, separate them into two ‘groups’

Page 44: 01 pattern recognition

UNSUPERVISED LEARNING

• CLUSTERING The data was not ‘labeled’ you did

not tell Nicolas which are apples which are oranges

May be the kid used the idea that things in the same group should be similar to one another as compared to things in the other group

Groups - Clusters

Separate groups or clusters

Page 45: 01 pattern recognition

Clustering

Tumor Size

Age

We have the data for patients but NOT the RIGHT ANSWERS. The objective is to find interesting structures in data (in this case two clusters)

Page 46: 01 pattern recognition

Unsupervised Learning – Cocktail Party Effect

• Speakers recorded speaking simultaneously

Page 47: 01 pattern recognition

Unsupervised Learning – Cocktail Party Effect

• Source Separation• Data can be explained by two different

speakers speaking – ICA algorithm

Source: http://cnl.salk.edu/~tewon/Blind/blind_audio.html

Page 48: 01 pattern recognition

Classification vs Clustering

• Challenges– Intra-class variability– Inter-class similarity

Page 49: 01 pattern recognition

Intra class variability

The letter “T” in different typefaces

Same face under different expression, pose, illumination

Page 50: 01 pattern recognition

Inter class similarity

Characters that look similar

Identical twins

Page 51: 01 pattern recognition

Contents

• Supervised learning

– Classification

– Regression

• Unsupervised learning

• Reinforcement learning

Page 52: 01 pattern recognition

Reinforcement Learning

• In RL, the computer is simply given a goal to achieve.

• The computer then learns how to achieve that goal by trial-and-error interactions with its environment

System learns from success and failure, reward and punishment

Page 53: 01 pattern recognition

Reinforcement Learning

• Objective: Fly the helicopter• Need to make a sequence of good decisions to make it fly• Similar to training a pet dog

Every time dog does something good

you pat him and say ‘good dog’

Every time dog does some thing bad

you scold him saying ‘bad dog’

Over time dog will learn to do good

things

Page 54: 01 pattern recognition

Task …• Separation of different coins using a robotic arm

05/01/23

Page 55: 01 pattern recognition

A Fancy problem

Sorting incoming fish on a conveyor according to species (salmon or sea bass) using optical sensing

Salmon or sea bass?(2 categories or classes)

It is a classification problem. How to solve it?

Page 56: 01 pattern recognition

Approach

Data Collection: Take some images using

optical sensor

Page 57: 01 pattern recognition

Approach

• Data collectionHow to use it?

• Preprocessing: Use a segmentation operation to isolate fishes from one another and from the backgroundImage processing?

• Information from a single fish is sent to a feature extractor whose purpose is to reduce the data by measuring certain featuresBut which features to extract?

• The features are passed to a classifier that evaluates the evidence and then takes a final decisionHow to design and realize a classifier?

Page 58: 01 pattern recognition

Approach• Set up a camera and take some sample

images to extract features – Length– Lightness– Width– Number and shape of fins– Position of the mouth, etc…

• This is the set of all suggested features to explore for use in our classifier!

• Challenges:– Variations in images – lightning, occlusion, camera view angle– Position of the fish on the conveyer belt, etc…

Page 59: 01 pattern recognition

How data is collected & used• Data can be raw signals (e.g. images) or features extracted from images – data is

usually expensive

• The data is divided into three parts (exact percentage of each portion depends (partially) on data sample size)

• Train data: It is used to build a prediction model or learner (classifier)

• Validation data: It is used to estimate the prediction error (classification error) and adjust the learner parameters

• Test data: It is used to estimate the classification error of the chosen learner on unseen data called generalization error. The test must be kept inside a ‘vault’ and be brought out only at the end of data analysis

Train Validation Test

Page 60: 01 pattern recognition

Preprocessing• If data is an image then apply image processing• What is an image?

– A gray scale image z = f(x,y) is composed of pixels where x & y are the location of the pixel and z is its intensity

– Image can be considered just a matrix of certain dimensions

11 1

1

An

m mn

a a

a a

Divided into 8x8 blocks

Page 61: 01 pattern recognition

Preprocessing

• Examples of image processing operations• Filtering: It is used for enhancing the image or removing noise from image

• Thresholding: Segment object from background

• and many more …

Filtering

Thresholding

Page 62: 01 pattern recognition

Feature extraction• Feature extraction: use domain knowledge

– The sea bass is generally longer than a salmon– The average lightness of sea bass scales is greater

than that of salmon

• We will used training data in order to learn a classification rule based on these features (length of a fish and average lightness)

• Length of fish and average lightness may not be sufficient features i.e. they may not guarantee 100% classification results

Page 63: 01 pattern recognition

Classification – Option 1• Select the length of the fish as a possible feature for

discrimination between two classes

Decision Boundary

Histograms for the length feature for the two categories

Page 64: 01 pattern recognition

Cost of Taking a Decision• A fish-packaging industry use the system to

pack fish in cans.• Two facts

– People do not want to find sea bass in the cans labeled salmon

– People occasionally accepts to find salmon in the cans labeled sea-bass

• So the cost of taking a decision in favor of sea bass when the true reality is salmon is not the same as the converse

Page 65: 01 pattern recognition

Evaluation of a classifier• How to evaluate a certain classifier?

• Classification error: The percentage of patterns (e.g. fish) that are assigned to wrong category

– Choose a classifier that gives minimum classification

error

• Risk is the total expected cost of decisions

– Choose a classifier that minimizes the risk

Page 66: 01 pattern recognition

Classification – option 2• Select the average lightness of the fish as a possible

feature for discrimination between two classes

Histograms for the average lightness feature for the two categories

Page 67: 01 pattern recognition

Classification – option 3

• Use both length and average lightness features for classification. Use a simple line to discriminate

1 2x xx

Decision Boundary

The two features of lightness and width for sea bass and salmon. The dark line might serve as a decision boundary

of our classifier

Page 68: 01 pattern recognition

Classification – option 3• Use both length and average lightness features for

classification. Use a complex model to discriminate

68Overly complex models for the fish will lead to decision

boundaries that are complicated. While such a decision may lead to perfect classification (classification error is zero) of our training samples, it would lead to poor performance on

future patterns (generalization is poor) overfitting

Page 69: 01 pattern recognition

Comments

• Model selection– A complex model seems not be correct one. It is

learning the training data by heart. – So how to choose correct model? (a difficult

question)– “simpler models should be preferred over complex

ones”

• Generalization error– The minimization of classification error on train

database does not guarantee minimization of classification error on test database (generalization error)

Page 70: 01 pattern recognition

Classification – Option 3

• Decision boundary with good generalization

The decision boundary shown might represent the optimal tradeoff between performance on the training

set and simplicity of classifier.

Page 71: 01 pattern recognition

Components of a typical pattern recognition system

• Sensing– Use of a sensor (camera or microphone)

• Segmentation– Patterns should be well separated and should not

overlap

• Feature extraction– Discriminative features– Invariant features with respect to translation, rotation

and scale– Challenges

• Occlusions• Deformations

Page 72: 01 pattern recognition

Components of a typical pattern recognition system

• Classification– Use a feature vector provided by a feature extractor to

assign the object to a category

– The classifier recommends actions (e.g. put this fish in this

bucket, put that fish in that bucket)

– This stage may employ single or multiple classifiers

• Post Processing– The post-processor uses the output of the classifier to

decide on the recommended action

– Exploit context input dependent information other than

from the target pattern itself to improve performance 72