01 pattern recognition

Introduction to Pattern Recognition & Machine Learning

Dr Khurram Khurshid

Pattern Recognition

What is a Pattern?“A pattern is the opposite of a chaos; it is an entity vaguelydefined, that could be given a name.”

Machine Perception and Pattern Recognition

Machine Perception - Build a machine that can recognize patterns

Pattern Recognition (PR)– Theory, Algorithms, Systems to Put Patterns into Categories– Classification of Noisy or Complex Data– Relate Perceived Pattern to Previously Perceived Patterns

[email protected]

Machine perception is the capability of a computer system to interpret data in a manner that is similar to the way humans use their senses to relate to the world around them

[email protected]

Pattern recognition is a branch of machine learning that focuses on the recognition of patterns and regularities in data, although it is in some cases considered to be nearly synonymous with machine learning. Pattern recognition systems are in many cases trained from labeled "training" data (supervised learning), but when no labeled data are available other algorithms can be used to discover previously unknown patterns (unsupervised learning).

What is Machine Learning?

Make the machine ‘learn’ some thing

Evaluate how good the machine has ‘learned’

[email protected]

Machine learning is a type of artificial intelligence (AI) that provides computers with the ability to learn without being explicitly programmed. Machine learning focuses on the development of computer programs that can teach themselves to grow and change when exposed to new data

Machine Learning• Sameul programmed a checkers game• Computer played thousands of times against

itself• Over time, it started to learn the patterns

/board positions that lead to wins and those that lead to losses.

• Program learned to play better than Sameul himself

First convincing example that computers can do things other than what they have been programmed to!

Learning Problems – Examples • Learning = Improving with experience over

some task– Improve over task T,– With respect to performance measure P,– Based on experience E.

• Example– T = Play checkers– P = % of games won in a tournament– E = opportunity to play against itself

Machine Learning

• Learning = Improving with experience over some taskA computer program is said to learn from experience E

with respect to some task T and performance measure P, if its performance at task T, as measured by P, improves

with experience E

Learning Problems – Examples • Handwriting recognition

learning problem– Task T: recognizing handwritten

words within images– Performance measure P:

percent of words correctly recognized

– Training experience E: a database of handwritten words with given classifications

Learning Problems – Examples • A robot driving learning problem

– Task T: driving on public four-lane highways using vision sensors

– Performance measure P: average distance traveled before an error (as judged by human overseer)

– Training experience E: a sequence of images and steering commands recorded while observing a human driver

Machine Learning

• Nicolas learns about trucks and combines

Machine learning

• But will he recognize others?

So learning involves ability to generalize from labeled examples

Machine Learning

• There is no need to “learn” to calculate payroll• Learning is used in:

– Data mining programs that learn to detect fraudulent credit card transactions

– Programs that learn to filter spam email– Programs that learn to play checkers/chess– Autonomous vehicles that learn to drive on public

highways– And many more…

Machine learning

Learning associations• Basket analysis:

P (Y | X ) probability that somebody who buys X also buys Y where X and Y are products/services.

Example: P ( chips | coke ) = 0.7

Credit Scoring

• Differentiating between low-risk and high-risk customers from their income and savings

Discriminant: IF income > θ1 AND savings > θ2 THEN low-risk ELSE high-risk

Applications – Pattern Recognition• Speech Recognition• Face Recognition• Handwriting & Character Recognition• Autonomous Driving

Autonomous driving

• ALVINN – Drives 70mph on highways

Face recognition

18

Training examples of a person

Test images

AT&T Laboratories, Cambridge UKhttp://www.uk.research.att.com/facedatabase.html

OCR & Handwriting recognition

Speech recognition

Features

• Features are the individual measurable properties of the

signal being observed.

• The set of features used for learning/recognition is called

feature vector.

• The number of used features is the dimensionality of the

feature vector.

• n-dimensional feature vectors can be represented as

points in n-dimensional feature space

Features

x

2

1

xx

height

weight

Class 1

Class 2

Class 1

Class 2

Feature Extraction• Feature extraction aims to create discriminative

features good for learning• Good Features

– Objects from the same class have similar feature values.

– Objects from different classes have different values.

“Good” features “Bad” features

Contents

• Supervised learning

– Classification

– Regression

• Unsupervised learning

• Reinforcement learning

CLASSIFICATION

Supervised learning - Classification

• Objective– Make Nicolas recognize what is an apple and

what is an orange

Classification

Apples Oranges

Classification

What is this???

Its an apple!!!

• You had some training example or ‘training data’

• The examples were ‘labeled’

• You used those examples to make the kid ‘learn’ the difference between an apple and an orange

Classification

Given: training images and their categories What are the categories of these test images?

Apple

Pear

Tomato

Cow

Dog

Horse

Classification

• Cancer Diagnosis – Generally more than one variables

Tumor Size

MalignantBenign

Age

Why supervised – The algorithm is given a number of patients with the RIGHT ANSWER and we want the algorithm to learn to predict for new patients

Classification

• Cancer Diagnosis – Generally more than one variables

Tumor Size

MalignantBenign

Age

We want the algorithm to learn the separation line. Once a new patient arrives with a given age and tumor size – Predict as Malignant or Benign

Predict for this patient

Supervised Learning - Example

Cancer diagnosis – Many more features

Use this training set to learn how to classify patients where diagnosis is not known:

Input Data Classification

Patient ID # of Tumors Avg Area Avg Density Diagnosis1 5 20 118 Malignant2 3 15 130 Benign3 7 10 52 Benign4 2 30 100 Malignant

Patient ID # of Tumors Avg Area Avg Density Diagnosis101 4 16 95 ?102 9 22 125 ?103 1 14 80 ?

REGRESSION

Regression

The variable we are trying to predict is DISCRETE

The variable we are trying to predict is CONTINUOUS

CLASSIFICATION

REGRESSION

Regression

• Dataset giving the living areas and prices of 50 houses

Regression

• We can plot this data

Given data like this, how can we learn to predict the prices of other houses as a function of the size of their living areas?

Regression• The “input” variables – x(i) (living area in this

example)• The “output” or target variable that we are

trying to predict – y(i) (price)• A pair (x(i), y(i)) is called a training example• A list of m training examples {(x(i), y(i)); i =• 1, . . . ,m}—is called a training set• X denote the space of input values, and Y the

space of output values

RegressionGiven a training set, to learn a function h : X → Y so that h(x) is a “good” predictor for the corresponding value of y. For historical reasons, this function h is called a hypothesis.

Regression

Regression

• Example: Price of a used car

• x : car attributesy : price

Contents


– Classification

– Regression



CLUSTERING

UNSUPERVISED LEARNING

• CLUSTERING

There are two types of fruit in the basket, separate them into two ‘groups’

UNSUPERVISED LEARNING

• CLUSTERING The data was not ‘labeled’ you did

not tell Nicolas which are apples which are oranges

May be the kid used the idea that things in the same group should be similar to one another as compared to things in the other group

Groups - Clusters

Separate groups or clusters

Clustering

Tumor Size

Age

We have the data for patients but NOT the RIGHT ANSWERS. The objective is to find interesting structures in data (in this case two clusters)

Unsupervised Learning – Cocktail Party Effect

• Speakers recorded speaking simultaneously

Unsupervised Learning – Cocktail Party Effect

• Source Separation• Data can be explained by two different

speakers speaking – ICA algorithm

Source: http://cnl.salk.edu/~tewon/Blind/blind_audio.html

http://cnl.salk.edu/~tewon/Blind/blind_audio.html

Classification vs Clustering

• Challenges– Intra-class variability– Inter-class similarity

Intra class variability

The letter “T” in different typefaces

Same face under different expression, pose, illumination

Inter class similarity

Characters that look similar

Identical twins

Contents


– Classification

– Regression



Reinforcement Learning

• In RL, the computer is simply given a goal to achieve.

• The computer then learns how to achieve that goal by trial-and-error interactions with its environment

System learns from success and failure, reward and punishment

Reinforcement Learning

• Objective: Fly the helicopter• Need to make a sequence of good decisions to make it fly• Similar to training a pet dog

Every time dog does something good

you pat him and say ‘good dog’

Every time dog does some thing bad

you scold him saying ‘bad dog’

Over time dog will learn to do good

things

Task …• Separation of different coins using a robotic arm

05/01/23

A Fancy problem

Sorting incoming fish on a conveyor according to species (salmon or sea bass) using optical sensing

Salmon or sea bass?(2 categories or classes)

It is a classification problem. How to solve it?

Approach

Data Collection: Take some images using

optical sensor

Approach

• Data collectionHow to use it?

• Preprocessing: Use a segmentation operation to isolate fishes from one another and from the backgroundImage processing?

• Information from a single fish is sent to a feature extractor whose purpose is to reduce the data by measuring certain featuresBut which features to extract?

• The features are passed to a classifier that evaluates the evidence and then takes a final decisionHow to design and realize a classifier?

Approach• Set up a camera and take some sample

images to extract features – Length– Lightness– Width– Number and shape of fins– Position of the mouth, etc…

• This is the set of all suggested features to explore for use in our classifier!

• Challenges:– Variations in images – lightning, occlusion, camera view angle– Position of the fish on the conveyer belt, etc…

How data is collected & used• Data can be raw signals (e.g. images) or features extracted from images – data is

usually expensive

• The data is divided into three parts (exact percentage of each portion depends (partially) on data sample size)

• Train data: It is used to build a prediction model or learner (classifier)

• Validation data: It is used to estimate the prediction error (classification error) and adjust the learner parameters

• Test data: It is used to estimate the classification error of the chosen learner on unseen data called generalization error. The test must be kept inside a ‘vault’ and be brought out only at the end of data analysis

Train Validation Test

Preprocessing• If data is an image then apply image processing• What is an image?

– A gray scale image z = f(x,y) is composed of pixels where x & y are the location of the pixel and z is its intensity

– Image can be considered just a matrix of certain dimensions

11 1

1

An

m mn

a a

a a

Divided into 8x8 blocks

Preprocessing

• Examples of image processing operations• Filtering: It is used for enhancing the image or removing noise from image

• Thresholding: Segment object from background

• and many more …

Filtering

Thresholding

Feature extraction• Feature extraction: use domain knowledge

– The sea bass is generally longer than a salmon– The average lightness of sea bass scales is greater

than that of salmon

• We will used training data in order to learn a classification rule based on these features (length of a fish and average lightness)

• Length of fish and average lightness may not be sufficient features i.e. they may not guarantee 100% classification results

Classification – Option 1• Select the length of the fish as a possible feature for

discrimination between two classes

Decision Boundary

Histograms for the length feature for the two categories

Cost of Taking a Decision• A fish-packaging industry use the system to

pack fish in cans.• Two facts

– People do not want to find sea bass in the cans labeled salmon

– People occasionally accepts to find salmon in the cans labeled sea-bass

• So the cost of taking a decision in favor of sea bass when the true reality is salmon is not the same as the converse

Evaluation of a classifier• How to evaluate a certain classifier?

• Classification error: The percentage of patterns (e.g. fish) that are assigned to wrong category

– Choose a classifier that gives minimum classification

error

• Risk is the total expected cost of decisions

– Choose a classifier that minimizes the risk

Classification – option 2• Select the average lightness of the fish as a possible

feature for discrimination between two classes

Histograms for the average lightness feature for the two categories

Classification – option 3

• Use both length and average lightness features for classification. Use a simple line to discriminate

1 2x xx

Decision Boundary

The two features of lightness and width for sea bass and salmon. The dark line might serve as a decision boundary

of our classifier

Classification – option 3• Use both length and average lightness features for

classification. Use a complex model to discriminate

68Overly complex models for the fish will lead to decision

boundaries that are complicated. While such a decision may lead to perfect classification (classification error is zero) of our training samples, it would lead to poor performance on

future patterns (generalization is poor) overfitting

Comments

• Model selection– A complex model seems not be correct one. It is

learning the training data by heart. – So how to choose correct model? (a difficult

question)– “simpler models should be preferred over complex

ones”

• Generalization error– The minimization of classification error on train

database does not guarantee minimization of classification error on test database (generalization error)

Classification – Option 3

• Decision boundary with good generalization

The decision boundary shown might represent the optimal tradeoff between performance on the training

set and simplicity of classifier.

Components of a typical pattern recognition system

• Sensing– Use of a sensor (camera or microphone)

• Segmentation– Patterns should be well separated and should not

overlap

• Feature extraction– Discriminative features– Invariant features with respect to translation, rotation

and scale– Challenges

• Occlusions• Deformations

Components of a typical pattern recognition system

• Classification– Use a feature vector provided by a feature extractor to

assign the object to a category

– The classifier recommends actions (e.g. put this fish in this

bucket, put that fish in that bucket)

– This stage may employ single or multiple classifiers

• Post Processing– The post-processor uses the output of the classifier to

decide on the recommended action

– Exploit context input dependent information other than

from the target pattern itself to improve performance 72

Engineering

01 pattern recognition