Upload
pi194043
View
372
Download
0
Embed Size (px)
DESCRIPTION
A basic Implementation of discrete hidden markov model for sequence classification in C++ using Eigen
Citation preview
Discrete HiddenMarkov Models for
SequenceClassification
Pi19404
February 24, 2014
Contents
Contents
Discrete Hidden Markov Models for Sequence Classi�cation 3
0.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30.2 Hidden Latent Variables . . . . . . . . . . . . . . . . . . . . . . . . . . 3
0.2.1 Forward algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 40.2.2 Observation Sequence Likelyhood . . . . . . . . . . . . . 70.2.3 Sequence Classification . . . . . . . . . . . . . . . . . . . . . 80.2.4 Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2 | 8
Discrete Hidden Markov Models for Sequence Classification
Discrete Hidden Markov Models
for Sequence Classification
0.1 Introduction
In this article we will look at hidden Markov models and its application
in classification of discrete sequential data.
� Markov processes are examples of stochastic processes that
generate random sequences of outcomes or states according
to certain probabilities.
� Markov chains can be considered mathematical descriptions of
Markov models with a discrete set of states.
� In Markov chain the observed variables are states ,now we will
consider the case that the observed symbol is not directly the
states but some random variable related to the underlying state
sequence.
� The true state of the system is latent and unobserved.
0.2 Hidden Latent Variables
� Let Y denote a sequence of random variables taking the values
in a euclidean space 0
� A simplistic model is where each observation state is associated
with a single hidden state,Each hidden state emits a unique ob-
servation symbol.
In this case if we know exactly if a observed variable corresponds
to a hidden state then the problem reduces to Markov chain.
� However it may happen that observed state corresponds to
more than one hidden state with certain probabilities. For ex-
3 | 8
Discrete Hidden Markov Models for Sequence Classification
ample
P (X = x0jY = y1) = 0:9
P (X = x0jY = y2) = 0:1
� Thus if the hidden state x0 ,it is likely to emit a observed state
Y = y1 with probability 0.9 and observed state y1 with probability
0.1
� Thus we can consider than 90% of time if we observe the
state y1 and 10% of times we will observe the y2
� The random variables Y are not independent nor do they repre-
sent samples from a Markov process.
� However given a realization Xi = xi the random variable Yi is
independent of other random variables X and therefore other
Y
� Given that we know the hidden/latent state,the observation
probabilities are independent.
P (Y1Y2jX = xi) = P (Y1jX = xi)P (Y2jX = xi)
� The sequence of observation/emission and corresponding prob-
abilities that emission corresponds to hidden state is specified
by a emission matrix.
� If N is the number of state and M is number of observations
the emission matrix is NxM matrix.
� The model is called discrete hidden Markov model since the emis-
sion probabilities are discrete in nature,another class of hidden
Markov models exist which are called continuous hidden Markov
model wherein the emission probabilities are continuous and
modeled using parametric distribution.
0.2.1 Forward algorithm
� The probability of observed sequence given we are in a hidden
state zn can be computed using forward algorithm
� In forward algorithm we compute the probability of observed
sequence from first element of sequence to last element of
sequence as opposed to backward algorithm where we compute
the same starting from last sequence moving backward for 1st
element of sequence.
4 | 8
Discrete Hidden Markov Models for Sequence Classification
� The forward and backward algorithm can be used to compute
the probability of observed sequence
� The idea is to estimate the underlying hidden states which are
Markov chains.
� Since we have N hidden states we can compute NXL matrix con-
taining the probabilities of observing the hidden state ,where L
is length of sequence.
� The result is stored in the matrix �
� Each column of matrix � ie �(j; i) provides the probabilities of
observing the sequence at each increment of the sequence.
� The Forward algorithm computes a matrix of state probabilities
which can be used to assess the probability of being in each of
the states at any given time in the sequences. asP
j �(j; len� 1)
Let
�(zn) = p(x1; : : : ; xn; zn)
�(z1) = p(x1; zn) = p(x1jz1)p(z1)
�(zn�1) = p(x1; : : : ; xn�1; zn�1)
�(zn) =X
zn�1
p(x1; : : : ; xn�1; xn; zn�1; zn)
�(zn) =X
zn�1
p(x1; : : : ; xn�1; zn�1)P (xn; znjx1; : : : ; xn�1; zn�1)
�(zn) =X
zn�1
p(x1; : : : ; xn�1; zn�1)P (xnjzn; x1; : : : ; xn�1; zn�1)
P (znjzn; x1; : : : ; xn�1; zn�1)
Since xn is independent of all other states given zn
Since zn is independent of all other zi given zn�1
=X
zn�1
p(x1; : : : ; xn�1; zn�1)P (xnjzn)P (znjzn�1)
�(zn) =X
zn�1
�(zn�1)P (xnjzn)P (znjzn�1)
1 /**
2 * @brief forwardMatrix : method computes probability
3 * compute p(x1 ... xn,zn)
5 | 8
Discrete Hidden Markov Models for Sequence Classification
4 * using the forward algorithm
5 * @param sequence : is input observation sequence
6 */
7 void forwardMatrix(vector<int> &sequence)
8 {
9
10 int len=sequence.size();
11 for(int i=0;i<len;i++)
12 {
13 for(int j=0;j<_nstates;j++)
14 {
15 if(i==0)
16 _alpha(j,i)=_emission(j,sequence[i])*_initial(0,j);
17 else
18 {
19 float s=0;
20 //average over pos
21 for(int k=0;k<_nstates;k++)
22 s=s+_transition(k,j)*_alpha(k,i-1);
23 _alpha(j,i)=_emission(j,sequence[i-1])*s;
24 }
25
26 }
27 //stores the confidence and normalizing factor
28 float scale=0;
29 for(int j=0;j<_nstates;j++)
30 {
31 scale=scale+_alpha(j,i);
32 }
33 //normalizing factor is set to 1 for initial value
34 scale=1.f/scale;
35 if(i==0)
36 _scale(0,i)=1;
37 else
38 _scale(0,i)=scale;
39
40 //normalize the probability values of hidden
41 //states
42 for(int j=0;j<_nstates;j++)
43 {
44 _alpha(j,i)=scale*_alpha(j,i);
45
46 }
47 }
6 | 8
Discrete Hidden Markov Models for Sequence Classification
48
49 }
0.2.2 Observation Sequence Likelyhood
� This if we are given a random sequence of observations x1; : : : ; xn
� Starting from �(z1) computer �(zn)
� After which P (X = x1; : : : ; xn) =P
zn�(zn) can be computed.
� This gives us a estimate of probability of observing the se-
quence x1; : : : ; xn using the forward algorithm.
� The probability estimate is often computed as log probability
P (x1; : : : ; xn) =X
zn
�(zn)
prob =nY
i=1
X
zn
�(zn)
logprob =X
n
log(X
zn�(zn))
1 /**
2 * @brief likelihood : method to compute the log
3 * likelihood of observed sequence
4 * @param sequence :input observation sequence
5 * @return
6 */
7 float likely hood(vector<int> sequence)
8 {
9 float prob=0;
10 //computing the probability of observed sequence
11 //using forward algorithm
12 forwardMatrix(sequence);
13 backwardMatrix(sequence);
14
15 //computing t}he log probability of observed sequence
16 for(int i=0;i<sequence.size()+1;i++)
17 {
18 //for(int j=0;j<_nstates;j++)
7 | 8
Discrete Hidden Markov Models for Sequence Classification
19 {
20 prob=prob+std::log(_scale(0,i));
21 }
22
23
24
25 }
26
27 return -prob;
28 }
0.2.3 Sequence Classification
� Again for sequence classification assume we have two hidden
markov models �1 = (�!; trans1; emission2) and �1 = (�2; trans2; emission2)
� Given a observation sequence X = x1; : : : ; xn we compute the
probability of observing the sequence or probability that models
have generate the observation sequence.
� The observation sequence is estimated to be produced by the
model which exhibits the highest probability to have generated
the sequence.
y = argmaxNn=1prob(Xj�n)
0.2.4 Code
� code for discrete hidden Markov models can be found at git
repository https://github.com/pi19404/OpenVision in files ImgM-
L/hmm.hpp and ImgML/hmm.cpp.
8 | 8