21
0 Sequential Data and Markov Models Sargur N. Srihari [email protected] Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/CSE574/index.html

Sequential Data and Markov Models

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Sequential Data and Markov Models

0

Sequential Data and Markov Models

Sargur N. Srihari [email protected]

Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/CSE574/index.html

Page 2: Sequential Data and Markov Models

1

• Often arise through measurement of time series–Snowfall measurements on successive days –Rainfall measurements on successive days–Daily values of currency exchange rate–Acoustic features at successive time frames in speech

recognition–Nucleotide base pairs in a strand of DNA –Sequence of characters in an English sentence–Parts of speech of successive words

10

Sequential Data Examples

Page 3: Sequential Data and Markov Models

2

Markov Model – Weather• The weather of a day is observed as being one

of the following:–State 1: Rainy–State 2: Cloudy–State 3: Sunny

• Weather pattern of a locationTomorrow

Rain Cloudy Sunny

TodayRain 0.3 0.3 0.4

Cloudy 0.2 0.6 0.2Sunny 0.1 0.1 0.8

Page 4: Sequential Data and Markov Models

3

Markov Model – Weather State Diagram

0.8

0.1

0.1

0.3

0.3

0.4

0.6

0.20.2

Page 5: Sequential Data and Markov Models

4

Sound Spectrogram of Speech• “Bayes Theorem”• Plot of the intensity

of the spectral coefficients versus time index

• Successive observations of speech spectrum highly correlated (Markov dependency)

Page 6: Sequential Data and Markov Models

5

• States represent phonemes

• Production of word: “cat”• Represented by states

/k/ /a/ /t/ • Transitions from

• /k/ to /a/• /a/ to /t/• /t/ to a silent state

• Although only correct cat sound is represented by model, perhaps other transitions can be introduced, • eg, /k/ followed by /t/

10

Markov model for the production of spoken words

/t//a//k/

Markov ModelMarkov Modelfor word for word ““catcat””

Page 7: Sequential Data and Markov Models

6

Stationary vs Non-stationary

• Stationary: Data evolves over time but distribution remains same–e.g., dependence of current word over

previous word remains constant• Non-stationary: Generative distribution

itself changes over time

Page 8: Sequential Data and Markov Models

7

• Processes in time, states at time t are influenced by a state at time t-1

• Wish to predict next value from previous values, e.g., financial forecasting

• Impractical to consider general dependence of future dependence on all previous observations–Complexity grows without limit as number of

observations increases• Markov models assume dependence on

most recent observations

10

Making a Sequence of Decisions

Page 9: Sequential Data and Markov Models

8

Latent Variables

• While Markov models are tractable they are severely limited

• Introduction of latent variables provides a more general framework

• Lead to state-space models• When latent variables are:

–Discrete • they are called Hidden Markov models

–Continuous • they are linear dynamical systems

Page 10: Sequential Data and Markov Models

9

Hidden Markov Model

0.8

0.4

0.6

0.1

0.10.3

0.3

0.20.2

Hidden

0.60.3

0.1

0.5 0.3

0.2

0.8

0.2

0.0

Observations are Observations are activities of friend activities of friend described over telephonedescribed over telephoneAlternatively from data:Alternatively from data:

Day M Tu W Th

Temp

Barometer

Page 11: Sequential Data and Markov Models

10

Markov Model Assuming Independence

• Simplest model: –Assume observations are independent –Graph without links

• To predict whether it rains tomorrow is only based on relative frequency of rainy days

• Ignores influence of whether it rained the previous day

Page 12: Sequential Data and Markov Models

11

Markov Model

),..|(),..( 111

1 −=∏= nn

N

nN xxxpxxp

• Most general Markov model for observations {xn }• Product rule to express joint distribution of sequence of

observations

Page 13: Sequential Data and Markov Models

12

First Order Markov Model• Chain of observations {xn }

• Joint distribution for a sequence of n variables

• It can be verified (using product rule from above) that

• If model is used to predict next observation, distribution of prediction will only depend on preceding observation and independent of earlier observations

• Stationarity implies conditional distributions p(xn |xn-1 ) are all equal

)|()..|( 111 −− = nnnn xxpxxxp

)|()(),..( 12

11 −=∏= nn

N

nN xxpxpxxp

Page 14: Sequential Data and Markov Models

13

Markov Model – Sequence probability• What is the probability that the weather for the

next 7 days will be “S-S-R-R-S-C-S”?

–Find the probability of O, given the model.O S S S S S S S S= { , , , , , , , }3 3 3 1 1 3 2 3

4

233213113133333

23321311

3133333

32311333

10536.1)2.0()1.0()3.0()4.0()1.0()8.0()8.0(1

)|()|()|()|()|()|()|()(

)|,,,,,,,()|(

−×=

⋅⋅⋅⋅⋅⋅⋅=⋅⋅⋅⋅⋅⋅⋅=

⋅⋅⋅⋅⋅⋅⋅=

=

aaaaaaaSSPSSPSSPSSP

SSPSSPSSPSPModelSSSSSSSSPModelOP

π

Page 15: Sequential Data and Markov Models

14

Second Order Markov Model• Conditional distribution of observation xn

depends on the values of two previous observations xn-1 and xn-2

),|()|()(),..( 213

1211 −−=∏= nnn

N

nN xxxpxxpxpxxp

• Each observation is influenced by previous two observations

Page 16: Sequential Data and Markov Models

15

Mth Order Markov Source

• Conditional distribution for a particular variable depends on previous M variables

• Pay a price for number of parameters• Discrete variable with K states

–First order: p(xn |xn-1 ) needs K-1 parameters for each value of xn-1 for each of K states of xn giving K(K-1) parameters

–Mth order will need KM-1(K-1) parameters

Page 17: Sequential Data and Markov Models

16

Introducing Latent Variables• Model for sequences not limited by Markov

assumption of any order but with limited number of parameters

• For each observation xn , introduce a latent variable zn

• zn may be of different type or dimensionality to the observed variable

• Latent variables form the Markov chain• Gives the “state-space model”

ObservationsObservations

Latent variablesLatent variables

Page 18: Sequential Data and Markov Models

17

Conditional Independence with Latent Variables

• Satisfies key assumption that

• From d-separationWhen latent node zn is

filled, the only path between zn-1 and zn+1 has a head-to-tail node that is blocked

nnn zzz |11 −+ ⊥

Page 19: Sequential Data and Markov Models

18

Jt Distribution with Latent Variables

ObservationsObservations

Latent variablesLatent variables

)|()|()(),..,,..(12

1111 nn

N

n

N

nnnnN zxpzzpzpzzxxp ∏∏

==− ⎥⎦

⎤⎢⎣

⎡=

• Joint distribution for this model

• There is always a path between any xn and xm via latent variables which is never blocked

• Thus predictive distribution p(xn+1 |x1 ,..,xn ) for observation xn+1 does not exhibit conditional independence properties and is hence dependent on all previous observations

Page 20: Sequential Data and Markov Models

19

Two Models Described by Graph

1. Hidden Markov Model: If latent variables are discrete:Observed variables in a HMM may be discrete or continuous

2. Linear Dynamical Systems: If both latent and observed variables are Gaussian

Latent variablesLatent variables

ObservationsObservations

Page 21: Sequential Data and Markov Models

20

Further Topics on Sequential Data• Hidden Markov Models:

http://www.cedar.buffalo.edu/~srihari/CSE574/Chap11/Ch11.2- HiddenMarkovModels.pdf

• Extensions of HMMs:http://www.cedar.buffalo.edu/~srihari/CSE574/Chap11/Ch11.3- HMMExtensions.pdf

• Linear Dynamical Systems:http://www.cedar.buffalo.edu/~srihari/CSE574/Chap11/Ch11.4- LinearDynamicalSystems.pdf

• Conditional Random Fields:http://www.cedar.buffalo.edu/~srihari/CSE574/Chap11/Ch11.5- ConditionalRandomFields.pdf