Upload
dominh
View
239
Download
4
Embed Size (px)
Citation preview
Hidden Markov Models
Comp Sci 369 Dr Patricia Riddle
9/24/14 1 760 bayes & hmm
Automata Theory
An automaton is a mathematical model for a finite state machine (FSM). !
!An FSM is a machine that, given an input of
symbols, 'jumps' through a series of states according to a transition function (which can be expressed as a table). !
9/24/14 2 760 bayes & hmm
Finite State Machine A model of computation consisting of a set of states,
a start state, an input alphabet, and a transition function that maps input symbols and current states to a next state.
Computation begins in the start state with an input
string. It changes to new states depending on the transition function.
Also known as finite state automaton 9/24/14 3 760 bayes & hmm
Transition
! Current State/ Condition
State A State B State C
Condition X
… … …
Condition Y
… State C …
Condition Z
… … …
9/24/14 4 760 bayes & hmm
Deterministic Finite Automata
9/24/14 5 760 bayes & hmm
Nondeterministic Finite Automata
9/24/14 6 760 bayes & hmm
Variations There are many variants, for instance,
machines having actions (outputs) associated with transitions (Mealy machine) or states (Moore machine),
multiple start states, transitions conditioned on no input symbol (a null) more than one transition for a given symbol and state
(nondeterministic finite state machine), one or more states designated as accepting states (recognizer), etc.
9/24/14 7 760 bayes & hmm
Finite State Machines An automaton is represented by the 5-tuple <Q,Σ,δ,q0,F>, where:!
Q is a set of states.!!Σ is a finite set of symbols, that we will call the alphabet of the
language the automaton accepts.!
δ is the transition function, that is δ: QxΣ→Q (For non-deterministic automata, the empty string is an allowed input).!
q0 is the start state, that is, the state in which the automaton is when no input has been processed yet (Obviously, q0∈ Q).!
!F is a set of states of Q (i.e. F⊆Q), called accept states.!
9/24/14 8 760 bayes & hmm
Markov Chains - CS Definition Markov chain - A finite state machine with probabilities for
each transition, that is, a probability that the next state is sj given that the current state is si.
Note: Equivalently, a weighted, directed graph in which the
weights correspond to the probability of that transition. In other words, the weights are nonnegative and the total
weight of outgoing edges is positive. If the weights are normalized, the total weight, including self-
loops, is 1.
9/24/14 9 760 bayes & hmm
Markov Chain Graph
9/24/14 10 760 bayes & hmm
Markov Chain Example
9/24/14 11 760 bayes & hmm
Markov Chains - Statistics Definition In mathematics, a Markov chain, named after Andrey
Markov, is a discrete-time stochastic process with the Markov property.!
!That is a Markov chain is a series of states of a system
that has the Markov property. !!At each time the system may have changed from the
state it was in the moment before, or it may have stayed in the same state. The changes of state are called transitions.!
9/24/14 12 760 bayes & hmm
Markov Property
If a sequence of states has the Markov property, it means that every future state is conditionally independent of every prior state given the current state. !!Chain rule:
Markov assumption:
€
P(w1,...,wn ) = P(w1) P(wi |w1,...,wi−1)i= 2
n
∏
€
P(w1,...,wn ) ≈ P(w1) P(wi |wi−1)i= 2
n
∏9/24/14 13 760 bayes & hmm
Conditionally Independent
In probability theory, two states X and Y are conditionally independent given a third event Z precisely if the occurrence or non-occurrence of X and Y are independent events in their conditional probability distribution given Z. !
!In other words, !
P(X∩Y|Z) = P(X|Z) P(Y|Z)!Or equivalently, P(X|Y∩Z) = P(X|Z)!
9/24/14 14 760 bayes & hmm
Formal Statistics Definition A Markov chain is a sequence of random variables X1, X2, X3, ...
with the Markov property, namely that, given the present state, the future and past states are independent. !
!Formally, P(Xn+1=x|Xn=xn,…,X1=x1)=P(Xn+1=x|Xn=xn)!!The possible values of Xi form a countable set S called the state
space of the chain. !(We will be restricting ourselves to finite sets.)!!
Markov chains are often described by a directed graph, where the edges are labeled by the probabilities of going from one state to the other states.!
9/24/14 15 760 bayes & hmm
Introduction to Probability We describe a Markov chain as follows:
We have a set of states, S = {s1 , s2 , . . . , sr }. The process starts in one of these states and moves successively from one
state to another. Each move is called a step. !!
If the chain is currently in state si , then it moves to state sj at the next step with a probability denoted by pij , and this probability does not depend upon which states the chain was in before the current state.
The probabilities pij are called transition probabilities. !!The process can remain in the state it is in, and this occurs with
probability pii . An initial probability distribution, defined on S, specifies the starting state.
Usually this is done by specifying a particular state as the starting state. 9/24/14 16 760 bayes & hmm
Land of Oz According to Kemeny, Snell, and Thompson2, the Land of Oz
is blessed by many things, but not by good weather. They never have two nice days in a row. If they have a nice day, they are just as likely to have snow as
rain the next day. If they have snow or rain, they have an even chance of having
the same the next day. If there is change from snow or rain, only half of the time is
this a change to a nice day.
9/24/14 17 760 bayes & hmm
Example Continued With this information we form a Markov chain as
follows. We take as states the kinds of weather R, N, and S. From the above information we determine the transition
probabilities. These are most conveniently represented in a square array as !
€
R N S
P =
RNS
1/2 1/4 1/41/2 0 1/21/4 1/4 1/2
"
#
$ $ $
%
&
' ' '
9/24/14 18 760 bayes & hmm
Hidden Markov Models A hidden Markov model (HMM) is a statistical model in which the
system being modeled is assumed to be a Markov process with unknown parameters, and the challenge is to determine the hidden parameters from the observable parameters. !
!A HMM can be considered as the simplest dynamic Bayesian
network. (DBN means the arcs are directed. If the arcs aren’t directed you have Markov Random Fields (MRFs) or Markov networks)!
!In a regular Markov model, the state is directly visible to the
observer, and therefore the state transition probabilities are the only parameters. !
!In a hidden Markov model, the state is not directly visible, but
variables influenced by the state are visible. !9/24/14 19 760 bayes & hmm
HMMs Continued Each state has a probability distribution over the
possible output tokens. !!Therefore the sequence of tokens generated by an
HMM gives some information about the sequence of states.!
!Hidden Markov models are especially known for their
application in temporal pattern recognition such as speech, handwriting, gesture recognition, musical score following and bioinformatics.!
9/24/14 20 760 bayes & hmm
Hidden Markov Model Graph
9/24/14 21 760 bayes & hmm
Trellis Layout
9/24/14 22 760 bayes & hmm
Probability of HMM The probability of observing a sequence Y = y(0),y(1),...,y(L − 1) of
length L is given by: !!!P(Y) = ΣX P(Y|X) P(X)!
!where the sum runs over all possible hidden node sequences X =
x(0),x(1),...,x(L − 1). !!Brute force calculation of P(Y) is intractable for most real-life
problems, as the number of possible hidden node sequences is going to be extremely high. !
!The calculation can however be sped up enormously using an
algorithm called the forward procedure.!9/24/14 23 760 bayes & hmm
HMM CS Definition A variant of a finite state machine having a set of states, Q,
an output alphabet, O, transition probabilities, A, output probabilities, B, and initial state probabilities, Π.
The current state is not observable. Instead, each state
produces an output with a certain probability (B). Usually the states, Q, and outputs, O, are understood, so an
HMM is said to be a triple, (A, B, Π).
9/24/14 24 760 bayes & hmm
HMM Formal CS definition A = {aij = P(qj at t+1 | qi at t)}, where P(x | y) is the
conditional probability of x given y, t ≥ 1 is time, and qi ∈ Q. ��� Informally, A is the probability that the next state is qj
given that the current state is qi. B = {bik = P(ok | qi)}, where ok ∈ O.
Informally, B is the probability that the output is ok given that the current state is qi.
Π = {pi = P(qi at t=1)}. 9/24/14 25 760 bayes & hmm
HMM Formal Statistics Definition
States: A set of states S=s1,…,sn Transition probabilities: A= a1,1,a1,2,…,an,n Each ai,j represents
the probability of transitioning from state si to sj. Emission probabilities: A set B of functions of the form bi(ot)
which is the probability of observation ot being emitted by si
Initial state distribution: πi is the probability that si is a start
state
9/24/14 26 760 bayes & hmm
Hidden Markov Models Summary
Frequently, patterns do not appear in isolation but as part of a series in time - this progression can sometimes be used to assist in their recognition.
Assumptions are usually made about the time based process -
a common assumption is that the process's state is dependent only on the preceding N states - then we have an order N Markov model.
The simplest case is N=1, first-order Markov Model.
9/24/14 27 760 bayes & hmm
Uses for Hidden Markov Models
Various examples exists where the process states (patterns) are not directly observable, but are indirectly, and probabilistically, observable as another set of patterns - we can then define a hidden Markov model - these models have proved to be of great value in many current areas of research, notably speech recognition.
9/24/14 28 760 bayes & hmm
How to use HMMs Such models of real processes pose three problems that are
amenable to immediate attack; these are: Evaluation : with what probability does a given model generate a
given sequence of observations. The forward algorithm solves this problem efficiently.
Decoding : what sequence of hidden (underlying) states most
probably generated a given sequence of observations. The Viterbi algorithm solves this problem efficiently.
Learning : what model most probably underlies a given sample of
observation sequences - that is, what are the parameters of such a model. This problem may be solved by using the forward-backward algorithm (or Baum Welch)
9/24/14 29 760 bayes & hmm
Success of HMMs
HMMs have proved to be of great value in analysing real systems.
Their usual drawback is the over-
simplification associated with the Markov assumption that a state is dependent only on predecessors,
and that this dependence is time independent. 9/24/14 30 760 bayes & hmm
3 Main Problems - CS style There are three canonical problems associated with
HMMs:・!– Evaluation: Given the parameters of the model, compute
the probability of a particular output sequence. This problem is solved by the forward algorithm.!
– Decoding: Given the parameters of the model, find the most likely sequence of hidden states that could have generated a given output sequence. This problem is solved by the Viterbi algorithm.!
– Learning: Given an output sequence or a set of such sequences, find the most likely set of state transition and output probabilities. In other words, train the parameters of the HMM given a dataset of sequences. This problem is solved by the Baum-Welch algorithm, forward-backward algorithm, EM algorithm.!
9/24/14 31 760 bayes & hmm
References • CMSC 723: Introduction to Computational Linguistics - University of
Maryland Institute for Advanced Computer Studies -www.umiacs.umd.edu/~christof/courses/ cmsc723-fall04/lecture-notes/Lecture5-hmm.ppt
• Leeds - http://www.comp.leeds.ac.uk/roger/HiddenMarkovModels • Wikipedia - http://en.wikipedia.org/wiki/Markov_chain • Nist - http://www.nist.gov/dads/HTML/markovchain.html • Introduction to Probability - Charles M Grimstead J Laurie Snell -
http://www.dartmouth.edu/~chance/teaching_aids/books_articles/probability_book/Chapter11.pdf
• 2 J. G. Kemeny, J. L. Snell, G. L. Thompson, Introduction to Finite Mathematics, 3rd ed. (Englewood Cliffs, NJ: Prentice-Hall, 1974).
• www.ncbioportal.org/ training/mod3/hiddenmark.html • http://www.ncbioportal.org/training/mod3/hmmer.html
9/24/14 32 760 bayes & hmm