26
Part1 Markov Models for Pattern Recognition – Introduction CSE717, SPRING 2008 CUBS, Univ at Buffalo

Part1 Markov Models for Pattern Recognition – Introduction CSE717, SPRING 2008 CUBS, Univ at Buffalo

  • View
    220

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Part1 Markov Models for Pattern Recognition – Introduction CSE717, SPRING 2008 CUBS, Univ at Buffalo

Part1 Markov Models for Pattern Recognition – Introduction

CSE717, SPRING 2008

CUBS, Univ at Buffalo

Page 2: Part1 Markov Models for Pattern Recognition – Introduction CSE717, SPRING 2008 CUBS, Univ at Buffalo

Textbook

Markov models for pattern recognition: from theory to applications

by Gernot A. Fink, 1st Edition, Springer, Nov 2007

Page 3: Part1 Markov Models for Pattern Recognition – Introduction CSE717, SPRING 2008 CUBS, Univ at Buffalo

Textbook

Foundation of Math Statistics Vector Quantization and Mixture

Density Models Markov Models

Hidden Markov Model (HMM) Model formulation Classic algorithms in the HMM Application domain of the HMM

n-Gram Systems

Character and handwriting recognition

Speech recognition Analysis of biological sequences

Page 4: Part1 Markov Models for Pattern Recognition – Introduction CSE717, SPRING 2008 CUBS, Univ at Buffalo

Preliminary Requirements

Familiar with Probability Theory and Statistics

Basic concepts in Stochastic Process

Page 5: Part1 Markov Models for Pattern Recognition – Introduction CSE717, SPRING 2008 CUBS, Univ at Buffalo

Part 2 aFoundation of Probability Theory, Statistics & Stochastic Process

CSE717 , SPRING 2008

CUBS, Univ at Buffalo

Page 6: Part1 Markov Models for Pattern Recognition – Introduction CSE717, SPRING 2008 CUBS, Univ at Buffalo

Coin Toss Problem

Coin toss result: X: random variable head, tail: states SX: set of states Probabilities:

} tailhead,{ XSX

5.0)tail(Pr)head(Pr XX

Page 7: Part1 Markov Models for Pattern Recognition – Introduction CSE717, SPRING 2008 CUBS, Univ at Buffalo

Discrete Random Variable

A discrete random variable’s states are discrete: natural numbers, integers, etc

Described by probabilities of states PrX(s1), PrX(x=s2), …

s1, s2, …: discrete states (possible values of x)

Probabilities over all the states add up to 1 i

iX s 1)(Pr

Page 8: Part1 Markov Models for Pattern Recognition – Introduction CSE717, SPRING 2008 CUBS, Univ at Buffalo

Continuous Random Variable

A continuous random variable’s states are continuous: real numbers, etc

Described by its probability density function (p.d.f.): pX(s)

The probability of a<X<b can be obtained by integral

Integral from to

b

a X dssp )(

1)(

dsspX

Page 9: Part1 Markov Models for Pattern Recognition – Introduction CSE717, SPRING 2008 CUBS, Univ at Buffalo

Joint Probability and Joint p.d.f.

Joint probability of discrete random variables

Joint p.d.f. of continuous random variables

Independence Condition

iinXXX Xxxxxn

of state possibleany is ),,...,,(Pr 21,...,, 21

)(Pr...)(Pr)(Pr),...,,(Pr 2121,...,, 2121 nXXXnXXX xxxxxxnn

iinXXX Xxxxxpn

of state possibleany is ),,...,,( 21,...,, 21

)(...)()(),...,,( 2121,...,, 2121 nXXXnXXX xpxpxpxxxpnn

Page 10: Part1 Markov Models for Pattern Recognition – Introduction CSE717, SPRING 2008 CUBS, Univ at Buffalo

Conditional Probability and p.d.f.

Conditional probability of discrete random variables

Joint p.d.f. for continuous random variables

)(Pr/),(Pr)|(Pr 121,12| 11112xxxxx XXXXX

)(/),()|( 121,12| 11112xpxxpxxp XXXXX

Page 11: Part1 Markov Models for Pattern Recognition – Introduction CSE717, SPRING 2008 CUBS, Univ at Buffalo

Statistics: Expected Value and Variance

For discrete random variable

For continuous random variable

dxxxpXE X )(}{

i

iXi ssXE )(Pr}{

i

iXi sXEsXVar )(Pr}){(}{ 2

dxxpXExXVar X )(}){(}{ 2

Page 12: Part1 Markov Models for Pattern Recognition – Introduction CSE717, SPRING 2008 CUBS, Univ at Buffalo

Normal Distribution of Single Random Variable

Notation

p.d.f

Expected value

Variance

)2

)(exp(

2

1)(

2

2

x

xpX

}{xE

2}{ xVar

),( 2N

),(~ 2NX

Page 13: Part1 Markov Models for Pattern Recognition – Introduction CSE717, SPRING 2008 CUBS, Univ at Buffalo

Stochastic Process

A stochastic process is a time series of random variables : random variable t: time stamp

},,,{...,}{ 11 tttt XXXX

tX

Audio signal

Stock market

Page 14: Part1 Markov Models for Pattern Recognition – Introduction CSE717, SPRING 2008 CUBS, Univ at Buffalo

Causal Process

A stochastic process is causal if it has a finite history

A causal process can be represented by

,...,,, 21 tXXX

Page 15: Part1 Markov Models for Pattern Recognition – Introduction CSE717, SPRING 2008 CUBS, Univ at Buffalo

Stationary Process

A stochastic process is stationary if the probability at a fixed time t is the same for all other times, i.e., for any n, and ,

A stationary process is sometimes referred to as strictly stationary, in contrast with weak or wide-sense stationarity

}{ tX

}{,,,21 tttt XXXX

n

),,,(Pr

),,,(Pr

21,,,

21,,,

21

21

nXXX

nXXX

xxx

xxx

nttt

nttt

Page 16: Part1 Markov Models for Pattern Recognition – Introduction CSE717, SPRING 2008 CUBS, Univ at Buffalo

Gaussian White Noise

White Noise: obeys independent identical distribution (i.i.d.)

Gaussian White Noise

),(~ 2NX t

tX

Page 17: Part1 Markov Models for Pattern Recognition – Introduction CSE717, SPRING 2008 CUBS, Univ at Buffalo

Gaussian White Noise is a Stationary Process

Proof

for any n, and , }{,,,21 tttt XXXX

n

),,,(p

)2

)(exp(

)2(

1

)(

),,,(

21,,,

12

2

1

21,,,

21

21

nXXX

n

i

i

n

iiX

nXXX

xxx

x

xp

xxxp

nttt

it

nttt

Page 18: Part1 Markov Models for Pattern Recognition – Introduction CSE717, SPRING 2008 CUBS, Univ at Buffalo

Temperature

Q1: Is the temperature within a day stationary?

Page 19: Part1 Markov Models for Pattern Recognition – Introduction CSE717, SPRING 2008 CUBS, Univ at Buffalo

Markov Chains

A causal process is a Markov chain if

for any x1, …, xt

k is the order of the Markov chain First order Markov chain

Second order Markov chain

),,|(Pr),,|(Pr 1,,|11,,| 111 tkttXXXttXXX xxxxxx

tktttt

}{ tX

)|(Pr),,|(Pr 1|11,,| 111 ttXXttXXX xxxxx

tttt

),|(Pr),,|(Pr 12,|11,,| 1211 tttXXXttXXX xxxxxx

ttttt

Page 20: Part1 Markov Models for Pattern Recognition – Introduction CSE717, SPRING 2008 CUBS, Univ at Buffalo

Homogeneous Markov Chains

A k-th order Markov chain is homogeneous if the state transition probability is the same over time, i.e.,

Q2: Does homogeneous Markov chain imply stationary process?

}{ tX

k

kXXXkXXX

xxτt

xxxxxxktktt

,, ,, any for

),,|(Pr),,|(Pr

0

10,,|10,,| 11

Page 21: Part1 Markov Models for Pattern Recognition – Introduction CSE717, SPRING 2008 CUBS, Univ at Buffalo

State Transition in Homogeneous Markov Chains Suppose is a k-th order Markov chain and

S is the set of all possible states (values) of xt, then for any k+1 states x0, x1, …, xk, the state transition probability

can be abbreviated to

}{ tX

),,|(Pr 10,,| 1xxx kXXX tktt

),,|Pr( 10 xxx k

Page 22: Part1 Markov Models for Pattern Recognition – Introduction CSE717, SPRING 2008 CUBS, Univ at Buffalo

Rain Dry

0.60.4

0.2 0.8Two states : ‘Rain’ and ‘Dry’.Transition probabilities:

Pr(‘Rain’|‘Rain’)=0.4 , Pr(‘Dry’|‘Rain’)=0.6 , Pr(‘Rain’|‘Dry’)=0.2, Pr(‘Dry’|‘Dry’)=0.8

Example of Markov Chain

Page 23: Part1 Markov Models for Pattern Recognition – Introduction CSE717, SPRING 2008 CUBS, Univ at Buffalo

Rain Dry

0.60.4

0.2 0.8

Initial (say, Wednesday) probabilities: PrWed(‘Rain’)=0.3, PrWed(‘Dry’)=0.7

What’s the probability of rain on Thursday?

PThur(‘Rain’)= PrWed(‘Rain’)xPr(‘Rain’|‘Rain’)+PrWed(‘Dry’)xPr(‘Rain’|‘Dry’)= 0.3x0.4+0.7x0.2=0.26

Short Term Forecast

Page 24: Part1 Markov Models for Pattern Recognition – Introduction CSE717, SPRING 2008 CUBS, Univ at Buffalo

Rain Dry

0.60.4

0.2 0.8

Pt(‘Rain’)= Prt-1(‘Rain’)xPr(‘Rain’|‘Rain’)+Prt-1(‘Dry’)xPr(‘Rain’|‘Dry’)= Prt-

1(‘Rain’)x0.4+(1– Prt-1(‘Rain’)x0.2=0.2+0.2xPrt(‘Rain’)

Pt(‘Rain’)= Prt-1(‘Rain’) => Prt-1(‘Rain’)=0.25, Prt-1(‘Dry’)=1-0.25=0.75

Condition of Stationary

steady state distribution

Page 25: Part1 Markov Models for Pattern Recognition – Introduction CSE717, SPRING 2008 CUBS, Univ at Buffalo

Rain Dry

0.60.4

0.2 0.8

Pt(‘Rain’) = 0.2+0.2xPrt-1(‘Rain’)

Pt(‘Rain’) – 0.25 = 0.2x(Prt-1(‘Rain’) – 0.25)

Pt(‘Rain’) = 0.2t-1x(Pr1(‘Rain’)-0.25)+0.25

Pt(‘Rain’) = 0.25 (converges to steady state distribution)

Steady-State Analysis

tlim

Page 26: Part1 Markov Models for Pattern Recognition – Introduction CSE717, SPRING 2008 CUBS, Univ at Buffalo

Rain Dry

10

1 0

Periodic Markov chain never converges to steady states

Periodic Markov Chain