Implementation of discrete hidden markov model for sequence classification in C++ using Eigen

Discrete HiddenMarkov Models for

SequenceClassification

Pi19404

February 24, 2014

Contents

Contents

Discrete Hidden Markov Models for Sequence Classi�cation 3

0.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30.2 Hidden Latent Variables . . . . . . . . . . . . . . . . . . . . . . . . . . 3

0.2.1 Forward algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 40.2.2 Observation Sequence Likelyhood . . . . . . . . . . . . . 70.2.3 Sequence Classification . . . . . . . . . . . . . . . . . . . . . 80.2.4 Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 | 8

Discrete Hidden Markov Models for Sequence Classification

Discrete Hidden Markov Models

for Sequence Classification

0.1 Introduction

In this article we will look at hidden Markov models and its application

in classification of discrete sequential data.

� Markov processes are examples of stochastic processes that

generate random sequences of outcomes or states according

to certain probabilities.

� Markov chains can be considered mathematical descriptions of

Markov models with a discrete set of states.

� In Markov chain the observed variables are states ,now we will

consider the case that the observed symbol is not directly the

states but some random variable related to the underlying state

sequence.

� The true state of the system is latent and unobserved.

0.2 Hidden Latent Variables

� Let Y denote a sequence of random variables taking the values

in a euclidean space 0

� A simplistic model is where each observation state is associated

with a single hidden state,Each hidden state emits a unique ob-

servation symbol.

In this case if we know exactly if a observed variable corresponds

to a hidden state then the problem reduces to Markov chain.

� However it may happen that observed state corresponds to

more than one hidden state with certain probabilities. For ex-

3 | 8


ample

P (X = x0jY = y1) = 0:9

P (X = x0jY = y2) = 0:1

� Thus if the hidden state x0 ,it is likely to emit a observed state

Y = y1 with probability 0.9 and observed state y1 with probability

0.1

� Thus we can consider than 90% of time if we observe the

state y1 and 10% of times we will observe the y2

� The random variables Y are not independent nor do they repre-

sent samples from a Markov process.

� However given a realization Xi = xi the random variable Yi is

independent of other random variables X and therefore other

Y

� Given that we know the hidden/latent state,the observation

probabilities are independent.

P (Y1Y2jX = xi) = P (Y1jX = xi)P (Y2jX = xi)

� The sequence of observation/emission and corresponding prob-

abilities that emission corresponds to hidden state is specified

by a emission matrix.

� If N is the number of state and M is number of observations

the emission matrix is NxM matrix.

� The model is called discrete hidden Markov model since the emis-

sion probabilities are discrete in nature,another class of hidden

Markov models exist which are called continuous hidden Markov

model wherein the emission probabilities are continuous and

modeled using parametric distribution.

0.2.1 Forward algorithm

� The probability of observed sequence given we are in a hidden

state zn can be computed using forward algorithm

� In forward algorithm we compute the probability of observed

sequence from first element of sequence to last element of

sequence as opposed to backward algorithm where we compute

the same starting from last sequence moving backward for 1st

element of sequence.

4 | 8


� The forward and backward algorithm can be used to compute

the probability of observed sequence

� The idea is to estimate the underlying hidden states which are

Markov chains.

� Since we have N hidden states we can compute NXL matrix con-

taining the probabilities of observing the hidden state ,where L

is length of sequence.

� The result is stored in the matrix �

� Each column of matrix � ie �(j; i) provides the probabilities of

observing the sequence at each increment of the sequence.

� The Forward algorithm computes a matrix of state probabilities

which can be used to assess the probability of being in each of

the states at any given time in the sequences. asP

j �(j; len� 1)

Let

�(zn) = p(x1; : : : ; xn; zn)

�(z1) = p(x1; zn) = p(x1jz1)p(z1)

�(zn�1) = p(x1; : : : ; xn�1; zn�1)

�(zn) =X

zn�1

p(x1; : : : ; xn�1; xn; zn�1; zn)

�(zn) =X

zn�1

p(x1; : : : ; xn�1; zn�1)P (xn; znjx1; : : : ; xn�1; zn�1)

�(zn) =X

zn�1

p(x1; : : : ; xn�1; zn�1)P (xnjzn; x1; : : : ; xn�1; zn�1)

P (znjzn; x1; : : : ; xn�1; zn�1)

Since xn is independent of all other states given zn

Since zn is independent of all other zi given zn�1

=X

zn�1

p(x1; : : : ; xn�1; zn�1)P (xnjzn)P (znjzn�1)

�(zn) =X

zn�1

�(zn�1)P (xnjzn)P (znjzn�1)

1 /**

2 * @brief forwardMatrix : method computes probability

3 * compute p(x1 ... xn,zn)

5 | 8


4 * using the forward algorithm

5 * @param sequence : is input observation sequence

6 */

7 void forwardMatrix(vector<int> &sequence)

8 {

9

10 int len=sequence.size();

11 for(int i=0;i<len;i++)

12 {

13 for(int j=0;j<_nstates;j++)

14 {

15 if(i==0)

16 _alpha(j,i)=_emission(j,sequence[i])*_initial(0,j);

17 else

18 {

19 float s=0;

20 //average over pos

21 for(int k=0;k<_nstates;k++)

22 s=s+_transition(k,j)*_alpha(k,i-1);

23 _alpha(j,i)=_emission(j,sequence[i-1])*s;

24 }

25

26 }

27 //stores the confidence and normalizing factor

28 float scale=0;


30 {

31 scale=scale+_alpha(j,i);

32 }

33 //normalizing factor is set to 1 for initial value

34 scale=1.f/scale;

35 if(i==0)

36 _scale(0,i)=1;

37 else

38 _scale(0,i)=scale;

39

40 //normalize the probability values of hidden

41 //states


43 {

44 _alpha(j,i)=scale*_alpha(j,i);

45

46 }

47 }

6 | 8


48

49 }

0.2.2 Observation Sequence Likelyhood

� This if we are given a random sequence of observations x1; : : : ; xn

� Starting from �(z1) computer �(zn)

� After which P (X = x1; : : : ; xn) =P

zn�(zn) can be computed.

� This gives us a estimate of probability of observing the se-

quence x1; : : : ; xn using the forward algorithm.

� The probability estimate is often computed as log probability

P (x1; : : : ; xn) =X

zn

�(zn)

prob =nY

i=1

X

zn

�(zn)

logprob =X

n

log(X

zn�(zn))

1 /**

2 * @brief likelihood : method to compute the log

3 * likelihood of observed sequence

4 * @param sequence :input observation sequence

5 * @return

6 */

7 float likely hood(vector<int> sequence)

8 {

9 float prob=0;

10 //computing the probability of observed sequence

11 //using forward algorithm

12 forwardMatrix(sequence);

13 backwardMatrix(sequence);

14

15 //computing t}he log probability of observed sequence

16 for(int i=0;i<sequence.size()+1;i++)

17 {

18 //for(int j=0;j<_nstates;j++)

7 | 8


19 {

20 prob=prob+std::log(_scale(0,i));

21 }

22

23

24

25 }

26

27 return -prob;

28 }

0.2.3 Sequence Classification

� Again for sequence classification assume we have two hidden

markov models �1 = (�!; trans1; emission2) and �1 = (�2; trans2; emission2)

� Given a observation sequence X = x1; : : : ; xn we compute the

probability of observing the sequence or probability that models

have generate the observation sequence.

� The observation sequence is estimated to be produced by the

model which exhibits the highest probability to have generated

the sequence.

y = argmaxNn=1prob(Xj�n)

0.2.4 Code

� code for discrete hidden Markov models can be found at git

repository https://github.com/pi19404/OpenVision in files ImgM-

L/hmm.hpp and ImgML/hmm.cpp.

8 | 8

https://github.com/pi19404/OpenVision

Documents

Implementation of discrete hidden markov model for sequence classification in C++ using Eigen