Pattern Recognition - University of California, Irvineyqi/lect/Patternrecognition2.pdf ·...

Pattern Recognition

...A pattern is essentially an arrangement. It is characterized by the order of the elements of which it is made, rather than by the intrinsic nature of these elements.

Norbert Wiener

What is Pattern Recognition?

Pattern recognition (PR) is the scientific discipline that concerns the description and classification (recognition) of patterns (objects)PR techniques are an important component of intelligent systems and are used for many application domains• Decision making• Object and pattern classification

What is Pattern Recognition-Definitions from the literature

“The assignment of a physical object or event to one of several pre-specified categories” –Duda and Hart“A problem of estimating density functions in a high-dimensional space and dividing the space into the regions of categories or classes” – Fukunaga“Given some examples of complex signals and the correct decisions for them, make decisions automatically for a stream of future examples” –Ripley“The science that concerns the description or classification (recognition) of measurements” –Schalkoff“The process of giving names ω to observations x”, –SchürmannPattern Recognition is concerned with answering the question “What is this?” –Morse

Pattern Recognition and Related Areas

Image processingVideo processingSpeech/audio processingNatural Language processingMachine learningNeural networksDatabase engineeringBioinformaticsMuch more ….

Machine Perception

Build a machine that can recognize patterns:• Speech recognition• Fingerprint identification• OCR (Optical Character Recognition)• DNA sequence identification

An “Toy” Example: Fish Classification

“Sort incoming Fish on a conveyor according to species using optical sensing”

Sea bassSpecies

Salmon

Problem Analysis• Set up a camera and take some sample images to

extract features• Length• Lightness• Width• Number and shape of fins• Position of the mouth, etc…• This is the set of all suggested features to explore for use in

our classifier!

Preprocessing• Use a segmentation operation to isolate fishes from

one another and from the background• To extract one fish for the next step

Feature extraction• Measuring certain features of the fish to be classified• Is one of the most critical steps in the pattern

recognition system designClassification• Select the length of the fish as a possible feature for

discrimination

Adopt the lightness and add the width of the fish

Fish xT = [x1, x2]

Lightness Width

We might add other features that are not correlated with the ones we already have. A precaution should be taken not to reduce the performance by adding such “noisy features”

Ideally, the best decision boundary should be the one which provides an optimal performance such as in the following figure:

Pattern Recognition Systems

Sensing• Use of a transducer (camera or microphone)• PR system depends on the bandwidth, the

resolution sensitivity distortion of the transducer, etc.

Segmentation and grouping• Patterns should be well separated and should

not overlap

Feature extraction• Discriminative features• Invariant features with respect to translation, rotation and

scale.

Classification• Use a feature vector provided by a feature extractor to

assign the object to a category

Post Processing• Exploit context dependent information other than from the

target pattern itself to improve performance

The Design Cycle

Data collectionFeature ChoiceModel ChoiceTrainingEvaluationComputational Complexity

Data Collection• How do we know when we have collected an

adequately large and representative set of examples for training and testing the system?

Feature Choice• Depends on the characteristics of the

problem domain. Simple to extract, invariant to irrelevant transformation, insensitive to noise.

Model Choice• Unsatisfied with the performance of our fish

classifier and want to jump to another class of model

Training• Use data to determine the classifier. Many

different procedures for training classifiers and choosing models

Evaluation• Measure the error rate for:

• Different feature sets• Different training methods• Different training and test data sets

Computational Complexity• What is the trade-off between computational

ease and performance?• (How an algorithm scales as a function of the

number of features, patterns or categories?)

Supervised & Unsupervised Learning

Supervised learning• A teacher provides a category label or cost for each

pattern in the training set ( i.e., ground truth based on experts’ knowledge)

Unsupervised learning• The system forms clusters or “natural groupings” of

the input patterns

Semi-supervised learning• Use both labeled and un-labeled patterns to reduce

the labeling cost

Approaches for PRStatistical (StatPR)Patterns classified based on an underlying statistical model of the features• The statistical model is defined by a family of class-conditional probability

density functions Pr(x|ci) (Probability of feature vector x given class ci)Neural (NeurPR)• Classification is based on the response of a network of processing units

(neurons) to an input stimuli (pattern)• “Knowledge” is stored in the connectivity and strength of the synaptic weights

• NeurPR is a trainable, non-algorithmic, black-box strategy• NeurPR is very attractive since

• it requires minimum a priori knowledge • with enough layers and neurons, an ANN can create any complex decision region

Syntactic (SyntPR)• Patterns classified based on measures of structural similarity

• “Knowledge” is represented by means of formal grammars or relational descriptions (graphs)

• SyntPR is used not only for classification, but also for description• Typically, SyntPR approaches formulate hierarchical descriptions of complex patterns

built up from simpler sub patterns From: Ricardo Gutierrez-Osuna Texas A&M University

Decision-Theoretic Methods

Decision (discriminant) function

Decision boundary ;,...2,1 )()( then if such that,

)(),...(),( functionsdecision Find,..., classespattern Given

);,...,(ctor pattern veGiven

ijWjdddddW

≠=>∈

xxxxxx

ωωω

)()()( :, classes For two

⎪⎩

⎪⎨

jiijji

Decision-Theoretic MethodsMatching: each class is represented by a protopype pattern vector. A predifined metric is needed• Minimum distance clasifier• Matching by correlation

Optimal Statistical Classifiers• Bayes classifier

Neural Networks• Perceptrons• Layers• Training

MatchingMinimum distance classifier• Prototype def. as the mean vector of the class

• Comparing the Eucledian distances

• Decision function

• Decision boundary between two classes

• The surface is the perpendicular bisector of the line segment joining miand mj. For n=2 it is a line, for n=3 it is a plane, and for n>3 it is a hyperplane.

• Controlable mean separation and class spread

j ,...,2,1 1== ∑

∈ωxxm

WjD jj ,...,2,1 )( =−= mxx

Wjd jTjj

Tj ,...,2,1

21)( =−= mmmxx

0)()(21)()()()( =−−−−=−= ji

Tjiij ddd mmmmmmxxxx

MatchingMatching by correlation• Find matches of a subimage w(x,y) of size JxK within the image

f(x,y) of size MxN.

• a relative (normalized) correlation coefficient is prefered

• vulnerable to scale changes or rotation changes• The non-normalized version can be realized in FFT domain as well

∑ ∑ −=−=++=s t

NyMxtysxwtsfyxc 1...1,0 ;1,..1,0 )),(),(),(

[ ][ ][ ] [ ]{ } 2/122

)),( ),(),(

)),(),(),(),(

∑ ∑∑ ∑∑ ∑

−++−

−++−=

s ts t

wtysxwtsftsf

wtysxwtsftsfyxγ

Optimal Statistical Classifiers

Optimality in the sence that the classifier yields the lowest probability of commiting classification error• Probability that x comes from ωi: p(ωi /x)• If x is assigned (wrongly) to ωj : loss Lij

• Conditional averaged risk (loss)

• Bayes classifier minimises the cond. avaraged risk, i.e. assigns a new pattern x to the class ωi if:

kkkjj pLr

1)/()( xx ω

kkkkjj P

BpABpApBAp

ωωωω

ωω )(

, )/( ;)()/(

)(/)]/()([)/(

1 class of occurence of probabiltyclass from patterns the of pdf

where−−

∑∑==

kkkkiji PpLPpLrr

11)()/()()/()()( ωωωω xxxx >

Bayes classifier (cont.)

Two probabilities needed. While P(ωj) is easy to find (estimate), p(x/ωi)requires multivariate probability methods for its estimation. This is too complicated to be used in practice. Instead, analytical expressions (models) of the pdf are used. The necessary parameters are estimated from sample patterns from each class.

PppPpp

PppPpr

ijijijij

,...2,1 )()/()( :fuctionDecision

)()/()()/( :lyEquavalent

)()/()()()/()( if :Decision

)()/()()()/()1()(

. if 0 and if 1 where,1 :Then 1. isdecision incorrect any for loss the0, isdecision correct for loss The

−<−∈

−=−=

≠===−=

ωωωω

ωωωωω

ωωωωδ

δδδ

B.C. for Gaussian pattern classes• Consider 1-D case (n=1), two classes (W=2) governed by

Gaussian densities N(m1,σ1), N(m2,σ2)

• For n-dimensional case, instead of simple variance, covariance matrix is involved

2,1 )(21)()/()(

===−

jPePxpxd j

ωπσ

ωω σ

mmxxCxm

mxmxCxm

∑∑ ∈∈

−−−

−−−==

1 ;1 :{.} of ionsApproximat

valueexpected {.} )};(){( };{ where

1)/()()(

B.C. for Gaussian pattern classes• Exponents allow working with natural logarithms, since logarithm is

a monotonically increasing function preserving the numerical order

• The decision functions in n-D space are hyperquadrics (quadratic function in n-D space).

• Simplifications: • If all covariance matrices are equal => linear decision functions

(hyperplanes)• C=I, P(ωj)=1/W. Then Bayes clsfr reduces to min. distance clsfr

)(ln)/(ln)( jjj Ppd ωω += xx[ ]

[ ]jjT

mxCmxCx

−−−−=

−−−−−=

()(21ln

2)(ln)(or

()(21ln

2)(ln)(

Neural NetworksPreliminaries• Training patterns and training sets.

• The process by which a training set is used to obtain decision functions is called learning or traininge.g. the training is used to determine the parameters of the decision function (means, covariance matrices)

• The statistical properties of pattern classes often are unknown (hard to estimate). Such problems are best handled by direct training (no need to make assumptions regarding the unknown pdfs)

• Neurons: non-linear computing elements• Organized in networks• Learning mashines called perceptrons

Perceptron

The appropriate weights are applied to the inputs, and the resulting weighted sum passed to a function which produces the output y

)(ufy =

)w()( Tfufy x==

∑= i iixu w

Neural Networks

Perceptron for two pattern classes (see fig. above).• Weights modify the inputs to the input ot an activation

function• Decision boundary is a hyperplane in n-D space

• First n coeff. establish the orientation, while wn+1determines the distance to the origin

• Formally the free weight wn+1 can be included in the summation part by assuming one more input

• The key problem is to find the weight vector w using a given training set of patterns from each of two classes

Neural NetworksTraining algorithms• Linearly separable classes: an iterative algorithm

• w(1) –initial weight vector (arbitrary chosen)• at the k-th iterative step

• if y(k)∈ω1, and wT(k)y(k)≤0, replace w(k) withw(k+1)=w(k)+cy(k), c is a positive correction increment

• if y(k)∈ω2, and wT(k)y(k)≥0, replace w(k) withw(k+1)=w(k)−cy(k)

• otherwise, leave w(k) unchanged w(k+1)=w(k)• The algorithm converges if the two training sets are

linearly separable. This is called the perceptron training theorem

Neural NetworksTraining for nonseparable classes: delta rule

)()()()()()()(

)())()1(())()()(())()1()(()(

iserror theof change The .)()1()()(

pattern same theleavebut )1( tochange If .)( with commitederror theis )( ,)()()()( where),()()()1(

)())()()(()()1( )()(

)()()1(

gradient negative theofdirection in the adjust hence , when minimum has )(

if ,1 and if ,1 where,)(21)( :functionCriterion

kkekkkekke

kkkkkkrkkkrke

kkkrke

kkkekkkrkekkekk

kkkkrkkrJ

ywwywyw

wwywywww

yywwwyywww

−=−=Δ−=Δ

−+−=−−+−=Δ

+−==Δ=−+

−+=+⇒−−=∂

⎥⎦⎤

⎢⎣⎡

∂∂

∈−=∈+=−=

Neural NetworksDelta rule comments:• Changing the weights reduces the error by a factor

determined by α and energy of y• The choise of α controls the stability and speed of

convergence. • For stability 0< α <1. • Practical range 0.1< α <1.0

• The algorithm converges over the patterns of the training set. For separable classes, the solution may or may not produce a separating hyperplane

• Can be generalized to more than two classes and for non-linear decision functions

Neural NetworksMultilayer feedforward NN• One output layer (Q) and several intermediate layers. Usually the first layer

(A) has the dimension of the input vectors• Each neuron has similar structure as the perceptron model. The hard-

limiting function has been replaced by soft-limiting ’sigmoid’. The smooth function is prefered because of the differentiability.

• The parameter θj plays the same role as wn+1 in the perceptron model and can be added as a weight to an additional input

• Let K be the layer before the layer J. Then

• Total of NjxNk coefficients are necessary to specify the weighting + Nj coeff. are needed to complete the nodes at layer J.

shape thecontrols offset, thecontrols ;1

1)( 0/)( 0θθθθ jIjj jje

Ih +−+=

k kjkj

NjOwI k

...2,1for )(

...2,1for 1

==∑ =

1)(01 /)( θθ j

kNk kjk Owjj

+− ∑+=

Multilayer Feedforward NN

Inputlayer

Outputlayer

Hidden Layer

FFNN NEURON MODEL

The classical learning algorithm of FFNN is based on the gradient descent method. For this reason the activation function used in FFNN are continuous functions of the weights, differentiable everywhere. A typical activation function that can be viewed as a continuousapproximation of the step (threshold) function is the Sigmoid Function. A sigmoid function for node j is:

when a tends to infinity then ϕ tends to the step function

0 with)(v1

1j >= −

e javϕ

-10 -8 -6 -4 -2 2 4 6 8 10 jv

)( jvϕ1

Increasing a

node ofoutput and node to node fromlink of weight with

vwhere

ijij ∑=

Multilayer NN Training by backpropagation

pqpqqqqqp

qqqqpqqqqqq

p pqpqpqp

OOIhOrw

IhOrIhIhII

wOOwww

αδα

δαδα

=−=Δ

−=⇒=∂∂

∂−−=

∂−=

∂−=−=

∂−=Δ⇒=

∂∂

∂−=Δ

)(')( :Finally

)(')( )(')( );(

? compute toHow

where,

:rulechain theapplyingBy

:sderivative partial the toproportionin weights theAdjusting

)(21 :layeroutput for theerror squared Total

Multilayer NN Training by backpropagation: What happens in layer P ?

∑∑∑∑∑

=⎟⎟⎠

⎞⎜⎜⎝

∂−=

∂∂

⎟⎟⎠

⎞⎜⎜⎝

∂−=

∂−

∂−=

=−=Δ

q qpqppp

q qpqN

p pqpN

jpjppppqj

OOIhOrw

?~ );(')(

e)?(observeblknown are that s)(quantitie in terms restate toHow

network. in the observed becan or known are )(except termsAll

).(')( :is error term thewhere

αδα

Multilayer NN Training by backpropagation: Summarise the procedure• For any layers K and J (K precedes J) compute the weights wjk,

which modify the connections between these two layers, by using

• If J is the output layer, δj is • If J is an internal layer and P is the next layer (to the right), δj is

• Using an activation function with θ0=1 yelds

kjjk Ow αδ=Δ)(')( qjjjj IhOr −=δ

.,...2,1for )('1 j

p jppjjj NjwIh p == ∑ =δδ

layers internal for the )1(

layeroutput for the )1()(

)1()('

1∑ =−=

−−=

pNp jppjjj

Recent Advances in Pattern Recognition

New applicationsStandard databases for performance benchmark of algorithms and systemsFusion methods to improve performance• Sensors, features, classifiers

Robustness of algorithms and systemsUtilizing contexts or expert informationMany successful deployments• Speech recognition, handwritten character recognition,

face detection, fingerprint recognition, automatic vehicle guidance, visual inspections, computer aided diagnosis for mammogram

Summary

Pattern recognition systems aim to recognize patterns based on their featuresClassification is an important step in pattern recognition systemsPattern recognition algorithms and systems have been widely used in many application domainsChallenges remain to achieve human like performanceMore in SGN-2556 Pattern Recognition, 5th term

Pattern Recognition - University of California, Irvineyqi/lect/Patternrecognition2.pdf ·...

Documents

Oct. 07, 2008 - Pattern Recognition and the Stock Market - Dr

RAF103F Mynsturgreining Pattern Recognition · Pattern Recognition: Introduction 14 Pattern Recognition Pattern Recognition Pattern recognition is the science of making inferences

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 07: BAYESIAN ESTIMATION (Cont.) Objectives:

Pattern recognition, steps in science and consciousness.rduin.nl/presentations/Tutorial_Duin_PR_Barcelona_08.pdf · 2008-07-08 · Pattern recognition, steps in science and consciousness

Pattern Recognition: An Overview - Computerrlaz/prec20092/slides/Overview.pdf · Pattern Recognition: An Overview Prof. Richard Zanibbi. ... Solutions to Pattern Recognition Problems

Automatic drainage pattern recognition in river networks · 2017-07-31 · Automatic Drainage Pattern Recognition in River Networks Ling Zhang, Eric Guilbert Dept. of Land Surveying

KECE470 Pattern Recognition Introductionmcl.korea.ac.kr/wp-content/uploads/2016/07/01... · 2016. 7. 19. · KECE470 Pattern Recognition Introduction Chang-Su Kim Many slides are

Cellular Automata for Pattern Recognition · 2013-05-07 · 2. Cellular Automata for Pattern Recognition Elementary Cellular Automata (ECA) (Wolfram, 1994) is generally utilized as

Pattern Recognition •Why is pattern recognition important?cognitrn.psych.indiana.edu/rgoldsto/courses/patternrec.pdf · The Mystery of Pattern Recognition. Templates. Templates

1 Perceptual Processes Introduction Pattern Recognition Pattern Recognition Top-down Processing & Pattern Recognition Top-down Processing & Pattern Recognition

Pattern Recognition Techniques for Boson Sampling Validationqtml2017.di.univr.it/resources/Slides/Pattern-recognition...Pattern Recognition Techniques for Boson Sampling Validation

MULTIVARIATE PATTERN RECOGNITION FOR …mobil.bfr.bund.de/cm/349/multivariate-pattern-recognition-for... · Pattern Recognition Many definitions Most modern definitions involve classification

Pattern Recognition - Uni Kiel › images › teaching › lectures › pattern... · Digital Signal Processing and System Theory | Pattern Recognition | Speaker and Speech Recognition

Introduction to Pattern Recognition. Pattern Recognition

1 Chapter 1 INTRODUCTION. 2 What is Pattern Recognition? Pattern Recognition by Human perceptual specialized – decision making Pattern Recognition by

STRESS PATTERN RECOGNITION THROUGH WEARABLE BIOSENSORS … · 2019-07-01 · STRESS PATTERN RECOGNITION THROUGH WEARABLE BIOSENSORS IN THE WORKPLACE: EXPERIMENTAL LONGITUDINAL STUDY

Dr. Ng Teck Khim CS4243 Computer Vision and Pattern Recognition Pattern Recognition Basics 1 Pattern Recognition Features Principle Components Analysis

Syntactic Pattern Recognition - User page server for CoEuser.engineering.uiowa.edu/~image/READING/Duda-PR-transparencies...Sonka: Pattern Recognition Class 1 Syntactic Pattern Recognition

Pattern Recognition: Statistics to Deep Learningbiometrics.cse.msu.edu/Presentations/Anil Jain_BAAI_June21.pdf · Pattern Recognition By pattern recognition we mean the extraction

INTERNATIONAL ASSOCIATION FOR PATTERN RECOGNITION … · 2015-07-15 · Page 1 of 16 INTERNATIONAL ASSOCIATION FOR PATTERN RECOGNITION – TC 11 (READING SYSTEMS) ANNUAL REPORT TO