55
1 Pattern Recognition Chapter 3 Chapter 3 Hidden Markov Models Hidden Markov Models (HMMs) (HMMs)

1 Pattern Recognition Chapter 3 Hidden Markov Models (HMMs)

Embed Size (px)

Citation preview

Page 1: 1 Pattern Recognition Chapter 3 Hidden Markov Models (HMMs)

1

Pattern Recognition

Chapter 3Chapter 3

Hidden Markov Models (HMMs)Hidden Markov Models (HMMs)

Page 2: 1 Pattern Recognition Chapter 3 Hidden Markov Models (HMMs)

2

Hidden Markov Models (HMMs)

• SequentialSequential patterns: patterns:– The order of the data points is irrelevant.The order of the data points is irrelevant.

– No explicit sequencing ... No explicit sequencing ...

• Temporal Temporal patterns:patterns:– The result of a time process (e.g., time series).The result of a time process (e.g., time series).

– Can be represented by a number of states.Can be represented by a number of states.

– States at time States at time t t are influenced directly by states in are influenced directly by states in previous time steps (i.e., correlated).previous time steps (i.e., correlated).

Page 3: 1 Pattern Recognition Chapter 3 Hidden Markov Models (HMMs)

3

Hidden Markov Models (HMMs)

• HMMs are appropriate for problems that have an HMMs are appropriate for problems that have an inherent inherent temporalitytemporality..

– Speech recognitionSpeech recognition

– Gesture recognition Gesture recognition

– Human activity recognitionHuman activity recognition

Page 4: 1 Pattern Recognition Chapter 3 Hidden Markov Models (HMMs)

4

First-Order Markov Models

• They are represented by a They are represented by a graphgraph where every node where every node corresponds to a state corresponds to a state ωωii..

• The graph can be fully-connected with self-loops.The graph can be fully-connected with self-loops.• Links between nodes Links between nodes ωωii and and ωωjj are associated with a are associated with a

transition probabilitytransition probability: :

P( P( ωω(t+1)=(t+1)=ωωjj//ωω(t)=(t)=ωωi i )=)=ααijij

which is the probability of going to state which is the probability of going to state ωωjj at time at time t+1t+1 given that the state at time given that the state at time tt was was ωωii ( (first-orderfirst-order model).model).

Page 5: 1 Pattern Recognition Chapter 3 Hidden Markov Models (HMMs)

5

First-Order Markov Models (cont’d)

• The following constraints should be satisfied: The following constraints should be satisfied:

• Markov models are fully described by their Markov models are fully described by their transition probabilities transition probabilities ααijij

1,ijj

a i

Page 6: 1 Pattern Recognition Chapter 3 Hidden Markov Models (HMMs)

6

Example: Weather Prediction Model

• Assume three weather states:Assume three weather states:– ωω11: Precipitation (rain, snow, hail, etc.): Precipitation (rain, snow, hail, etc.)

– ωω22: Cloudy: Cloudy

– ωω33: Sunny : Sunny

Transition MatrixTransition Matrix

ωω 11 ωω 22 ωω 33

ωω11

ωω22

ωω33

ωω11

ωω22

ωω33

Page 7: 1 Pattern Recognition Chapter 3 Hidden Markov Models (HMMs)

7

Computing P(ωT) of a sequence of states ωT

• Given a sequence of states Given a sequence of states ωωTT=(=(ωω(1), (1), ωω(2),..., (2),..., ωω(T)),(T)), the probability that the model generated the probability that the model generated ωωTT is equal to the product of the corresponding is equal to the product of the corresponding transition probabilities:transition probabilities:

where where P(P(ωω(1)/ (1)/ ωω(0))=P((0))=P(ωω(1))(1)) is the is the prior prior probability of the first state.probability of the first state.

1

( ) ( ( ) / ( 1))T

T

t

P P t t

Page 8: 1 Pattern Recognition Chapter 3 Hidden Markov Models (HMMs)

8

Example: Weather Prediction Model (cont’d)

• What is the probability that the weather for eight What is the probability that the weather for eight consecutive days is: consecutive days is:

““sun-sun-sun-rain-rain-sun-cloudy-sun”sun-sun-sun-rain-rain-sun-cloudy-sun” ? ?

ωω88==ωω33ωω33ωω33ωω11ωω33ωω22ωω33

P(P(ωω88)=P()=P(ωω33)P()P(ωω33//ωω33)P()P(ωω33//ωω33)P()P(ωω11//ωω33)P()P(ωω33//ωω11))

P(P(ωω22//ωω33)P()P(ωω33//ωω22)=1.536 )=1.536 xx 10 10-4-4

Page 9: 1 Pattern Recognition Chapter 3 Hidden Markov Models (HMMs)

9

Limitations of Markov models

• In Markov models, each state is In Markov models, each state is uniquelyuniquely associated with an observable event.associated with an observable event.– Once an observation is made, the state of the Once an observation is made, the state of the

system is trivially retrieved.system is trivially retrieved.

• Such systems are Such systems are notnot of practical use for of practical use for most practical applications.most practical applications.

Page 10: 1 Pattern Recognition Chapter 3 Hidden Markov Models (HMMs)

10

Hidden States and Observations

• Assume that observations are a probabilistic Assume that observations are a probabilistic function of each state.function of each state.– Each state can produce can generate a number of Each state can produce can generate a number of

outputs (i.e., observations) according to a unique outputs (i.e., observations) according to a unique probability distribution.probability distribution.

– Each observation can potentially be generated at any Each observation can potentially be generated at any state. state.

• State sequence isState sequence is not not directly observable. directly observable.– Can be approximated by a sequence of observations. Can be approximated by a sequence of observations.

Page 11: 1 Pattern Recognition Chapter 3 Hidden Markov Models (HMMs)

11

First-order HMMs

• We augment the model such that when it is in state We augment the model such that when it is in state ωω(t)(t) it it also emits some symbol also emits some symbol v(t) (visible states)v(t) (visible states) among a set of among a set of possible symbols.possible symbols.

• We have access to the visible states only, while the We have access to the visible states only, while the ωω(t)(t) are are unobservableunobservable..

Page 12: 1 Pattern Recognition Chapter 3 Hidden Markov Models (HMMs)

12

Example: Weather Prediction Model (cont’d)

vv11: temperature: temperature

vv22: humidity: humidity

etc.etc.

Observations:Observations:

Page 13: 1 Pattern Recognition Chapter 3 Hidden Markov Models (HMMs)

13

First-order HMMs

• For every sequence of -hidden- states, there For every sequence of -hidden- states, there is an associated sequence of visible states:is an associated sequence of visible states:

ωωTT=(=(ωω(1), (1), ωω(2),..., (2),..., ωω(T)) (T)) V VTT=(v=(v(1), (1), vv(2),..., (2),..., v(T))v(T))

• When the model is in state When the model is in state ωωjj at time at time t t, the , the

probability of emitting a visible stateprobability of emitting a visible state v vkk at at

that time is denoted as:that time is denoted as:

P(v(t)=vP(v(t)=vkk/ / ωω(t)= (t)= ωωjj)=b)=bjk’ jk’ wherewhere

((observation probabilitiesobservation probabilities))

1,jkk

b j

Page 14: 1 Pattern Recognition Chapter 3 Hidden Markov Models (HMMs)

14

Absorbing State

• Given a state sequence and its corresponding Given a state sequence and its corresponding observation sequence:observation sequence:

ωωTT=(=(ωω(1), (1), ωω(2),..., (2),..., ωω(T)) (T)) V VTT=(v=(v(1), (1), vv(2),..., (2),..., v(T))v(T))

we assume thatwe assume that ωω(T)=(T)=ωω0 0 is some is some absorbingabsorbing state, state,

which uniquely emits symbol which uniquely emits symbol v(T)=vv(T)=v00

• Once entering the absorbing state, the system can Once entering the absorbing state, the system can not escape from it.not escape from it.

Page 15: 1 Pattern Recognition Chapter 3 Hidden Markov Models (HMMs)

15

HMM Formalism

• An HMM is defined by {An HMM is defined by {ΩΩ, V, V, , – ΩΩ : { : {ωω11… … ωω n n } are the possible states} are the possible states

– VV : {v : {v11…v…vm m } are the possible observations} are the possible observations

– prior state probabilitiesprior state probabilities

– A A = {a= {aijij} are the state transition probabilities} are the state transition probabilities

– BB = {b = {bikik} are the observation state probabilities} are the observation state probabilities

Page 16: 1 Pattern Recognition Chapter 3 Hidden Markov Models (HMMs)

16

Some Terminology

• CausalCausal: the probabilities depend only upon : the probabilities depend only upon previous states.previous states.

• ErgodicErgodic: Every one of the states has : Every one of the states has a non-a non-zero probabilityzero probability of occurring given some of occurring given some starting state.starting state.

““left-right”left-right” HMM HMM

Page 17: 1 Pattern Recognition Chapter 3 Hidden Markov Models (HMMs)

17

Coin toss example

• You are in a room with a barrier (e.g., a curtain) You are in a room with a barrier (e.g., a curtain) through which you cannot see what is happening.through which you cannot see what is happening.

• On the other side of the barrier is another person On the other side of the barrier is another person who is performing a coin (or multiple coin) toss who is performing a coin (or multiple coin) toss experiment.experiment.

• The other person will tell you only the result of the The other person will tell you only the result of the experiment, experiment, notnot how he obtained that resulthow he obtained that result!!!!

e.g., e.g., VVTT=HHTHTTHH...T=v(1),v(2), ..., v(T)=HHTHTTHH...T=v(1),v(2), ..., v(T)

Page 18: 1 Pattern Recognition Chapter 3 Hidden Markov Models (HMMs)

18

Coin toss example (cont’d)

• Problem:Problem: derive an HMM model to explain the derive an HMM model to explain the observed sequence of heads and tails.observed sequence of heads and tails.– The coins represent the states; these are hidden because The coins represent the states; these are hidden because

we do not know which coin was tossed each time.we do not know which coin was tossed each time.

– The outcome of each toss represents an observation.The outcome of each toss represents an observation.

– A “likely” sequence of coins may be inferred from the A “likely” sequence of coins may be inferred from the observations.observations.

– As we will see, the state sequence will not be unique in As we will see, the state sequence will not be unique in general. general.

Page 19: 1 Pattern Recognition Chapter 3 Hidden Markov Models (HMMs)

19

Coin toss example:1-fair coin model

• There are 2 states, each associated with either There are 2 states, each associated with either heads (heads (state1state1) or tails () or tails (state2state2).).

• Observation sequence uniquely defines the statesObservation sequence uniquely defines the states (model is (model is notnot hidden). hidden).

observationsobservations

Page 20: 1 Pattern Recognition Chapter 3 Hidden Markov Models (HMMs)

20

Coin toss example:2-fair coins model

• There are 2 states but neither state is uniquely There are 2 states but neither state is uniquely associated with either heads or tails (i.e., each state associated with either heads or tails (i.e., each state can be associated with a different fair coin).can be associated with a different fair coin).

• A third coin is used to decide which of the fair A third coin is used to decide which of the fair coins to flip.coins to flip.

observationsobservations

Page 21: 1 Pattern Recognition Chapter 3 Hidden Markov Models (HMMs)

21

Coin toss example:2-biased coins model

• There are 2 states with each state associated with a There are 2 states with each state associated with a biased coin.biased coin.

• A third coin is used to decide which of the biased A third coin is used to decide which of the biased

coins to flip.coins to flip.

observationsobservations

Page 22: 1 Pattern Recognition Chapter 3 Hidden Markov Models (HMMs)

22

Coin toss example:3-biased coins model

• There are 3 states with each state associated with a There are 3 states with each state associated with a biased coin.biased coin.

• We decide which coin to flip using some way We decide which coin to flip using some way

(e.g., other coins).(e.g., other coins).

observationsobservations

Page 23: 1 Pattern Recognition Chapter 3 Hidden Markov Models (HMMs)

23

Which model is best?

• Since the states are not observable, the best Since the states are not observable, the best we can do is we can do is select the model that best select the model that best explains the dataexplains the data. .

• Long observation sequences would be best Long observation sequences would be best for selecting the best model ... for selecting the best model ...

Page 24: 1 Pattern Recognition Chapter 3 Hidden Markov Models (HMMs)

24

Classification Using HMMs

• Given an observation sequence VGiven an observation sequence VTT and set of possible and set of possible models, choose the model with the highest probability.models, choose the model with the highest probability.

( / ) ( )( / )

( )

TT

T

P V PP V

P V

Bayes formula:Bayes formula:

Page 25: 1 Pattern Recognition Chapter 3 Hidden Markov Models (HMMs)

25

Main Problems in HMMs

• EvaluationEvaluation– Determine the probability Determine the probability P(VP(VTT)) that a particular that a particular

sequence of visible states sequence of visible states VVTT was generated by a given was generated by a given model (based on dynamic programming).model (based on dynamic programming).

• DecodingDecoding– Given a sequence of visible states Given a sequence of visible states VVTT, determine the most , determine the most

likely sequence of likely sequence of hiddenhidden states states ωωTT that led to those that led to those observations (based on dynamic programming).observations (based on dynamic programming).

• LearningLearning– Given a set of visible observations, determine Given a set of visible observations, determine aaijij and and bbjk jk

(based on EM algorithm).(based on EM algorithm).

Page 26: 1 Pattern Recognition Chapter 3 Hidden Markov Models (HMMs)

26

Evaluation

1

( ) ( ( ) / ( 1))T

Tr r r

t

P P t t

(i.e., possible # of state sequences)(i.e., possible # of state sequences)

Page 27: 1 Pattern Recognition Chapter 3 Hidden Markov Models (HMMs)

27

Evaluation (cont’d)

max

1 1

( ) ( ( ) / ( )) ( ( ) / ( 1))r T

Tr r r

r t

P V P v t t P t t

1

( / ) ( ( ) / ( ))T

T Tr r

t

P V P v t t

(enumerate all possible transitions to determine how good the model is)(enumerate all possible transitions to determine how good the model is)

Page 28: 1 Pattern Recognition Chapter 3 Hidden Markov Models (HMMs)

28

Example: Evaluation

(enumerate all possible transitions to determine how good the model is)(enumerate all possible transitions to determine how good the model is)

Page 29: 1 Pattern Recognition Chapter 3 Hidden Markov Models (HMMs)

29

Computational Complexity

Page 30: 1 Pattern Recognition Chapter 3 Hidden Markov Models (HMMs)

30

Recursive computation of P(VT) (HMM Forward)

v(T)v(1) v(t) v(t+1)

ω(1) ω(t) ω(t+1) ω(T)

ωωii ωωjj......

Page 31: 1 Pattern Recognition Chapter 3 Hidden Markov Models (HMMs)

31

Recursive computation of P(VT) (HMM Forward) (cont’d)

Using maginalization:Using maginalization:

( 1) ( (1), (2),..., ( ), ( 1), ( ) , ( 1) )j i ji

a t P v v v t v t t t

Page 32: 1 Pattern Recognition Chapter 3 Hidden Markov Models (HMMs)

32

Recursive computation of P(VT) (HMM Forward) (cont’d)

ωω00 ωω00 ωω00 ωω00 ωω00

0( ) ( )TP V a T

Page 33: 1 Pattern Recognition Chapter 3 Hidden Markov Models (HMMs)

33

Recursive computation of P(VT) (HMM Forward) (cont’d)

(i.e., corresponds to state (i.e., corresponds to state ωω00= = ωω(T)(T)))0( ) ( )TP V a T

for j=0 to c dofor j=0 to c do

Page 34: 1 Pattern Recognition Chapter 3 Hidden Markov Models (HMMs)

34

Example

ωω0 0 ωω11 ωω2 2 ωω33

ωω0 0

ωω11

ωω2 2

ωω33

ωω0 0

ωω11

ωω2 2

ωω33

Page 35: 1 Pattern Recognition Chapter 3 Hidden Markov Models (HMMs)

35

Example (cont’d)

• Similarly for t=2,3,4Similarly for t=2,3,4

• Finally:Finally:0( ) ( ) 0.0011TP V a T

VVTT==

0.20.2

0.20.2

0.80.8

(0.00108)(0.00108)

initial stateinitial state

Page 36: 1 Pattern Recognition Chapter 3 Hidden Markov Models (HMMs)

36

The backward algorithm (HMM backward)

v(1)

ω(1) ω(t) ω(t+1) ω(T)

v(t) v(t+1) v(T)

......ωωii ωωjj

ββjj(t+1)(t+1) //ω ω (t+1)=(t+1)=ωωjj))

ββii(t)(t) ii

ββii(t)(t) ωωii

Page 37: 1 Pattern Recognition Chapter 3 Hidden Markov Models (HMMs)

37

The backward algorithm (HMM backward) (cont’d)

==ωωjj))))

( )1

( ) ( 1)c

i j ij iv tj

t t a b

1

( ( 1), ( 2),..., ( ) / ( ) ) ( ( ) / ( ) ) ( ( 1) / ( ) )c

j i j ij

P v t v t v T t P v t t P t t

oror ii

v(1)

ω(1) ω(t) ω(t+1) ω(T)

v(t) v(t+1) v(T)

ωωii ωωjj

Page 38: 1 Pattern Recognition Chapter 3 Hidden Markov Models (HMMs)

38

The backward algorithm (HMM backward) (cont’d)

( )1

( ) ( 1)c

i j ij iv tj

t t a b

Page 39: 1 Pattern Recognition Chapter 3 Hidden Markov Models (HMMs)

39

Decoding

• We need to use an optimality criterion to solve We need to use an optimality criterion to solve this problem (i.e., there are several possible ways this problem (i.e., there are several possible ways solving this problem since there are various solving this problem since there are various optimality criteria we could use).optimality criteria we could use).

• Algorithm 1Algorithm 1: choose the states : choose the states ωω(t)(t) which are which are individually most likelyindividually most likely (i.e., maximize the (i.e., maximize the expected number of correct individual states).expected number of correct individual states).

Page 40: 1 Pattern Recognition Chapter 3 Hidden Markov Models (HMMs)

40

Decoding – Algorithm 1 (cont’d)

Page 41: 1 Pattern Recognition Chapter 3 Hidden Markov Models (HMMs)

41

Decoding – Algorithm 2

• Algorithm 2Algorithm 2: at each time step : at each time step tt, find the , find the state that has the highest probability state that has the highest probability ααii(t).(t).

• Uses the Uses the forward algorithmforward algorithm with minor with minor changes.changes.

Page 42: 1 Pattern Recognition Chapter 3 Hidden Markov Models (HMMs)

42

Decoding – Algorithm 2 (cont’d)

Page 43: 1 Pattern Recognition Chapter 3 Hidden Markov Models (HMMs)

43

Decoding – Algorithm 2 (cont’d)

Page 44: 1 Pattern Recognition Chapter 3 Hidden Markov Models (HMMs)

44

Decoding – Algorithm 2 (cont’d)

• There is no guarantee that the path is a valid one.There is no guarantee that the path is a valid one.

• The path might imply a transition that is not allowed by The path might imply a transition that is not allowed by the model.the model.

not allowed! not allowed! ωω3232=0=0

0 1 2 3 40 1 2 3 4

Page 45: 1 Pattern Recognition Chapter 3 Hidden Markov Models (HMMs)

45

Decoding – Algorithm 3

Page 46: 1 Pattern Recognition Chapter 3 Hidden Markov Models (HMMs)

46

Decoding – Algorithm 3 (cont’d)

Page 47: 1 Pattern Recognition Chapter 3 Hidden Markov Models (HMMs)

47

Decoding – Algorithm 3 (cont’d)

Page 48: 1 Pattern Recognition Chapter 3 Hidden Markov Models (HMMs)

48

Decoding – Algorithm 3 (cont’d)

Page 49: 1 Pattern Recognition Chapter 3 Hidden Markov Models (HMMs)

49

Learning

• Use EMUse EM

– Update the weights iteratively to better explain the Update the weights iteratively to better explain the

observed training sequences.observed training sequences.

1 2max ( , ,..., / )T T TnP V V V

Page 50: 1 Pattern Recognition Chapter 3 Hidden Markov Models (HMMs)

50

Learning (cont’d)

• IdeaIdea

[# ]ˆ

[# ]

[# ]ˆ[# ]

i jij

i

j

j

E times it goes from toa

E times it goes from to any other state

E times it emits symbol k while at statebjk

E times it emits any other symbol while at state

Page 51: 1 Pattern Recognition Chapter 3 Hidden Markov Models (HMMs)

51

Learning (cont’d)

• Define the probability of transitioning from Define the probability of transitioning from ωωii to to ωωjj at step at step tt given V given VTT::

( 1) ( )( ) ( ( 1) , ( ) / )

( )i ij jk jT

ij j i T

t b tt P t t V

P V

(expectation step)(expectation step)

Page 52: 1 Pattern Recognition Chapter 3 Hidden Markov Models (HMMs)

52

Learning (cont’d)

Page 53: 1 Pattern Recognition Chapter 3 Hidden Markov Models (HMMs)

53

Learning (cont’d)

1

1

( )[# ]

ˆ[# ]

( )

T

iji j t

ij Ti

ikt k

tE times it goes from to

aE times it goes from to any other state

t

(maximization step)(maximization step)

Page 54: 1 Pattern Recognition Chapter 3 Hidden Markov Models (HMMs)

54

Learning (cont’d)

1, ( )

1

[# ]ˆ[# ]

( )

( )

j

j

T

jlt v t k l

T

jlt l

E times it emits symbol k while at statebjk

E times it emits any other symbol while at state

t

t

(maximization step)(maximization step)

Page 55: 1 Pattern Recognition Chapter 3 Hidden Markov Models (HMMs)

55

Difficulties

• How do we decide on the number of states and How do we decide on the number of states and the structure of the model?the structure of the model?– Use domain knowledge otherwise very hard problem!Use domain knowledge otherwise very hard problem!

• What about the size of observation sequence ?What about the size of observation sequence ?– Should be sufficiently long to guarantee that all state Should be sufficiently long to guarantee that all state

transitions will appear a sufficient number of times.transitions will appear a sufficient number of times.

– A large number of training data is necessary to learn A large number of training data is necessary to learn the HMM parameters.the HMM parameters.