1 Hidden Markov Model Presented by Qinmin Hu. 2 Outline Introduction Generating patterns Markov process Hidden Markov model Forward algorithm Viterbi

1

Hidden Markov Model

Presented by

Qinmin Hu

2

Outline

• Introduction

• Generating patterns

• Markov process

• Hidden Markov model

• Forward algorithm

• Viterbi algorithm

• Forward-backward algorithm

• Summary

3

Introduction

Motivation:

we are interested in finding patterns which appear over a space of time. These patterns occur in many areas; the pattern of commands someone uses in instructing a computer, sequences of words in sentences, the sequence of phonemes in spoken words - any area where a sequence of events occurs could produce useful patterns.

Seaweed Weather

Soggy Wet

Dry Sun

Observable states Hidden states

4

Generating Patterns (1)

Deterministic PatternsExample: the sequence of traffic lights is red - red/amber - green – amber

- red. The sequence can be pictured as a state machine, where the different states of the lights follow each other.

Notice that each state is dependent solely on the previous state, so if the light is green, an amber light will always follow - that is, the system is deterministic.

5

Generating Patterns (2)Non-deterministic patternsWeather example:

Unlike the light example, we cannot expect these three weather states to follow each other deterministically.So the weather states are non-deterministic.

Markov Assumption (simplifies problems greatly):The state of the model depends only upon the previous states of the model.

sunny cloudy rainy

Previous states Predicted states

D1 Dry / sunny

?D2 Dry / rainy

D3 Soggy / cloudy

D4 Soggy / rainy

D5 Dry / sunny

6

Markov Process Consisted of:

– States : Three weather states - sunny, cloudy, rainy.

– π vector : the probability of the system being in each of the states at time 1.

– State transition matrix : The probability of the weather given the previous day's weather.

7

Hidden Markov Model (1)Definitions:

A hidden Markov model (HMM) is a triple (π, A, B).

П = ( π_i ) - the vector of the initial state probabilities;

A = ( a_ij ) - the state transition matrix; a_ij = Pr( x_i_future | x_j_previous )

B = ( b_ij ) - the confusion matrix; b_ij = Pr( y_i | x_j )

NOTE: Each probability in the state transition matrix and in the confusion matrix is time independent - that is, the matrices do not change in time as the system evolves. In practice, this is one of the most unrealistic assumptions of Markov models about real processes.

8

Hidden Markov Model (2)HMM - contains two sets of states and three sets of

probabilities:– Hidden states: e.g., the weather states.

– Observable states: e.g., seaweed states.

– π vector: the probability of the (hidden) model at time t = 1.

– State transition matrix: the probability of a hidden state given the previous hidden state.

– Confusion matrix: the probability of observing states given the hidden states.

9

Hidden Markov Model (3)

A simple first order Markov process

10

Confusion Matrix

Hidden Markov Model (4)

π vector at t = 1State transition matrix

= 1= 1= 1

Hidden states

Previous hidden states

observable states

Hidden states

11

Hidden Markov Model (5)Once a system can be described as a HMM, three problems can be

solved. – Evaluation: Finding the probability of an observed sequence given a

HMM. • For example, we may have a `Summer' model and a `Winter' model for the

seaweed , we may then hope to determine the season on the basis of a sequence of dampness observations.

• Algorithm: Forward algorithm.

– Decoding: finding the sequence of hidden states that most probably generated an observed sequence.

• For example, we may know the states of seaweed as the observed sequence, then find the states of hidden weather.

• Algorithm: Viterbi algorithm.

– Learning (hardest): generating a HMM given a sequence of observations • For example, we may determine the triple (π, A, B) of weather HMM.• Algorithm: Forward-backward algorithm.

12

Forward Algorithm (1)(Evaluation)

Input: π vectors, A (state transition matrix), B (confusion matrix).

Output: The probability of an observed sequence.

Initial State Probabilities( π Vector)

sunny 0.63

cloudy 0.17

rainy 0.20

Confusion matrix ( B )

dry dryish damp soggy

sunny 0.60 0.20 0.15 0.05

cloudy 0.25 0.25 0.25 0.25

rainy 0.05 0.10 0.35 0.50

State transition matrix ( A )

Weather today

Weather

yesterday

sunny cloudy rainy

sunny 0.500 0.250 0.250

cloudy 0.375 0.125 0.375

rainy 0.125 0.675 0.375

Probability of the observed sequence


sunny ? ? ? ?

cloudy ? ? ? ?

rainy ? ? ? ?

13

Forward Algorithm (2) (Evaluation)

Partial probability

From the example, there would be 3^3=27 possible different weather sequences, and so the probability is :Pr(dry, damp, soggy | HMM) = Pr(dry, damp, soggy | sunny, sunny, sunny) + Pr(dry, damp, soggy | sunny, sunny, cloudy) + Pr(dry, damp, soggy | sunny, sunny, rainy) + . . . . Pr(dry, damp, soggy | rainy, rainy, rainy)

Expensive!

Reduction of complexity using recursive!

14

Forward Algorithm (3) (Evaluation)

Partial probability:

α_t ( j ) = Pr( observation | hidden state is j ) * Pr(all paths to state j at time t)

Steps:

Step 1: t =1

Step 2: t >1

15


sunny 0.63

cloudy 0.17

rainy 0.20



sunny 0.60 0.20 0.15 0.05

cloudy 0.25 0.25 0.25 0.25

rainy 0.05 0.10 0.35 0.50


Weather today

Weather

yesterday

sunny cloudy rainy

sunny 0.500 0.250 0.250

cloudy 0.375 0.125 0.375

rainy 0.125 0.675 0.375

α_1(1) = π(1) * b11 = 0.63 * 0.60 = 0.378α_1(2) = π(2) * b21 = 0.17 * 0.25 = 0.043α_1(3) = π(3) * b31 = 0.20 * 0.05 = 0.010

α_2(1) = [α_1(1) * a11 + α_1(2) * a21 + α_1(3) * a31] * b12 = 0.412

α_2(2) = [α_1(1) * 0.250 + α_1(2) * 0.125 + α_1(3) * 0.675] * 0.25 = 0.027

α_2(3) = [α_1(1) * 0.250 + α_1(2) * 0.375 + α_1(3) * 0.375] * 0.10 = 0.011

…………………………..



sunny α_1(1) α_2(1)

α_3(1) α_4(1)

cloudy α_1(2) α_2(2)

α_3(2) α_4(2)

rainy α_1(3) α_2(3)

α_3(3) α_4(3)



sunny 0.378 0.412 4.81E-3 2.74E-4

cloudy 0.043 0.027 5.34E-3 1.92E-3

rainy 0.010 0.011 8.60E-3 3.21E-3

16

Viterbi Algorithm (1)(Decoding)

Input: π vectors, A (state transition matrix), B (confusion matrix), the probability of an observed sequence.

Output: The most probable sequence of hidden states.


sunny 0.63

cloudy 0.17

rainy 0.20



sunny 0.60 0.20 0.15 0.05

cloudy 0.25 0.25 0.25 0.25

rainy 0.05 0.10 0.35 0.50


Weather today

Weather

yesterday

sunny cloudy rainy

sunny 0.500 0.250 0.250

cloudy 0.375 0.125 0.375

rainy 0.125 0.675 0.375



sunny 0.378 0.412 4.81E-3 2.74E-4

cloudy 0.043 0.027 5.34E-3 1.92E-3

rainy 0.010 0.011 8.60E-3 3.21E-3

17

Viterbi Algorithm (2) (Decoding)

Description:• Goal: to recapture the most likely underlying state

sequence.Pr( observed sequence | hidden state combination)

• Algorithm: – Through an execution trellis, calculating a partial probability for

each cell.– With a back-pointer, indicating how that cell could most probably

be reached. – On completion, the most likely final state is taken as correct, and

the path to it traced back to t=1 via the back pointers.

18


Partial probability

From the example, the most probable sequence of hidden states is the sequence that maximises : Pr(dry,damp,soggy | sunny,sunny,sunny), Pr(dry,damp,soggy | sunny,sunny,cloudy), Pr(dry,damp,soggy | sunny,sunny,rainy),…, Pr(dry,damp,soggy | rainy,rainy,rainy)

Expensive!

Similar to Forward algorithm, we use time invariance of the probabilities to reduce the complexity of the calculation.

19


each of the three states at t = 3 will have a most probable path to it, perhaps like the paths displayed in the second picture.

The paths called partial best paths. Each of them has an associated probability which is the partial probability δ’s.

t = 1,

t > 1,

20


Back pointer φ’s

δ(i,t) – we have known at each intermediate and end state.

However the aim is to find the most probable sequence of states through the trellis given an observation sequence.

Therefore we need some way of remembering the partial best paths through the trellis.

We want the back point φ’s to answer the question: If I am here, by what route is it most likely I arrived?

21


sunny 0.63

cloudy 0.17

rainy 0.20



sunny 0.60 0.20 0.15 0.05

cloudy 0.25 0.25 0.25 0.25

rainy 0.05 0.10 0.35 0.50


Weather today

Weather

yesterday

sunny cloudy rainy

sunny 0.500 0.250 0.250

cloudy 0.375 0.125 0.375

rainy 0.125 0.675 0.375δ_1(1) = 0.63 * 0.60 = 0.378δ_1(2) = 0.17 * 0.25 = 0.043δ_1(3) = 0.20 * 0.05 = 0.010Max (δ_1(1), δ_1(2), δ_1(3)) = δ_1(1) = 0.378

δ_2(1) = max (0.378 * 0.50 * 0.20, 0.043 * 0.375 * 0.20, 0.010 * 0.125 * 0.20) = 0.0378

δ_2(2) = max (0.378 * 0.25 * 0.25, 0.043 * 0.125 * 0.25, 0.010 * 0.675 * 0.25) = 0.0236

δ_2(3) = max (0.378 * 0.25 * 0.10, 0.043 * 0.375 * 0.10, 0.010 * 0.375 * 0.10) = 0.00945

…………………………..



sunny 0.378 0.412 4.81E-3 2.74E-4

cloudy 0.043 0.027 5.34E-3 1.92E-3

rainy 0.010 0.011 8.60E-3 3.21E-3

sunny

cloudy

rainy


22

Forward-backward algorithm(Learning)

Evaluation (forward algorithm) and decoding (viterbi algorithm): “useful”They both depend upon foreknowledge of the HMM parameters - the state transition matrix, the observation matrix, and the π vector.

Learning problem: forward-backward algorithm– There are many circumstances in practical problems where these

are not directly measurable, and have to be estimated. – The algorithm permits this estimate to be made on the basis of a

sequence of observations known to come from a given set, that represents a known hidden set following a Markov model.

Though the forward-backward algorithm is not unduly hard to comprehend, it is more complex in nature, so here not detailed in this presentation.

23

Summary• Generating patterns

– Patterns do not appear in isolation but as part of a series in time.

– A Markov assumption is that the process's state is dependent only on the preceding N states.

• Markov model– Because the process states (patterns) are not directly

observable, but are indirectly, and probabilistically, observable as another set of patterns

– There problems are solved• Evaluation: forward algorithm• Decoding: Viterbi algorithm• Learning: forward-backward algorithm

24

Thanks!

Questions?

Documents

1 Hidden Markov Model Presented by Qinmin Hu. 2 Outline Introduction Generating patterns Markov process Hidden Markov model Forward algorithm Viterbi