Hidden Markov Models - University of Auckland · Hidden Markov Models! A hidden Markov model (HMM) is a statistical model in which the ... problems, as the number of possible hidden

Hidden Markov Models

Comp Sci 369 Dr Patricia Riddle

9/24/14 1 760 bayes & hmm

Automata Theory

An automaton is a mathematical model for a finite state machine (FSM). !

!An FSM is a machine that, given an input of

symbols, 'jumps' through a series of states according to a transition function (which can be expressed as a table). !

9/24/14 2 760 bayes & hmm

Finite State Machine A model of computation consisting of a set of states,

a start state, an input alphabet, and a transition function that maps input symbols and current states to a next state.

Computation begins in the start state with an input

string. It changes to new states depending on the transition function.

Also known as finite state automaton 9/24/14 3 760 bayes & hmm

Transition

! Current State/ Condition

State A State B State C

Condition X

… … …

Condition Y

… State C …

Condition Z

… … …

9/24/14 4 760 bayes & hmm

Deterministic Finite Automata

9/24/14 5 760 bayes & hmm

Nondeterministic Finite Automata

9/24/14 6 760 bayes & hmm

Variations There are many variants, for instance,

machines having actions (outputs) associated with transitions (Mealy machine) or states (Moore machine),

multiple start states, transitions conditioned on no input symbol (a null) more than one transition for a given symbol and state

(nondeterministic finite state machine), one or more states designated as accepting states (recognizer), etc.

9/24/14 7 760 bayes & hmm

Finite State Machines An automaton is represented by the 5-tuple <Q,Σ,δ,q0,F>, where:!

Q is a set of states.!!Σ  is a finite set of symbols, that we will call the alphabet of the

language the automaton accepts.!

δ  is the transition function, that is δ: QxΣ→Q (For non-deterministic automata, the empty string is an allowed input).!

q0 is the start state, that is, the state in which the automaton is when no input has been processed yet (Obviously, q0∈ Q).!

!F is a set of states of Q (i.e. F⊆Q), called accept states.!

9/24/14 8 760 bayes & hmm

Markov Chains - CS Definition Markov chain - A finite state machine with probabilities for

each transition, that is, a probability that the next state is sj given that the current state is si.

Note: Equivalently, a weighted, directed graph in which the

weights correspond to the probability of that transition. In other words, the weights are nonnegative and the total

weight of outgoing edges is positive. If the weights are normalized, the total weight, including self-

loops, is 1.

9/24/14 9 760 bayes & hmm

Markov Chain Graph

9/24/14 10 760 bayes & hmm

Markov Chain Example

9/24/14 11 760 bayes & hmm

Markov Chains - Statistics Definition In mathematics, a Markov chain, named after Andrey

Markov, is a discrete-time stochastic process with the Markov property.!

!That is a Markov chain is a series of states of a system

that has the Markov property. !!At each time the system may have changed from the

state it was in the moment before, or it may have stayed in the same state. The changes of state are called transitions.!

9/24/14 12 760 bayes & hmm

Markov Property

If a sequence of states has the Markov property, it means that every future state is conditionally independent of every prior state given the current state. !!Chain rule:

Markov assumption:

€

P(w1,...,wn ) = P(w1) P(wi |w1,...,wi−1)i= 2

n

∏

€

P(w1,...,wn ) ≈ P(w1) P(wi |wi−1)i= 2

n

∏9/24/14 13 760 bayes & hmm

Conditionally Independent

In probability theory, two states X and Y are conditionally independent given a third event Z precisely if the occurrence or non-occurrence of X and Y are independent events in their conditional probability distribution given Z. !

!In other words, !

P(X∩Y|Z) = P(X|Z) P(Y|Z)!Or equivalently, P(X|Y∩Z) = P(X|Z)!

9/24/14 14 760 bayes & hmm

Formal Statistics Definition A Markov chain is a sequence of random variables X1, X2, X3, ...

with the Markov property, namely that, given the present state, the future and past states are independent. !

!Formally, P(Xn+1=x|Xn=xn,…,X1=x1)=P(Xn+1=x|Xn=xn)!!The possible values of Xi form a countable set S called the state

space of the chain. !(We will be restricting ourselves to finite sets.)!!

Markov chains are often described by a directed graph, where the edges are labeled by the probabilities of going from one state to the other states.!

9/24/14 15 760 bayes & hmm

Introduction to Probability We describe a Markov chain as follows:

We have a set of states, S = {s1 , s2 , . . . , sr }. The process starts in one of these states and moves successively from one

state to another. Each move is called a step. !!

If the chain is currently in state si , then it moves to state sj at the next step with a probability denoted by pij , and this probability does not depend upon which states the chain was in before the current state.

The probabilities pij are called transition probabilities. !!The process can remain in the state it is in, and this occurs with

probability pii . An initial probability distribution, defined on S, specifies the starting state.

Usually this is done by specifying a particular state as the starting state. 9/24/14 16 760 bayes & hmm

Land of Oz According to Kemeny, Snell, and Thompson2, the Land of Oz

is blessed by many things, but not by good weather. They never have two nice days in a row. If they have a nice day, they are just as likely to have snow as

rain the next day. If they have snow or rain, they have an even chance of having

the same the next day. If there is change from snow or rain, only half of the time is

this a change to a nice day.

9/24/14 17 760 bayes & hmm

Example Continued With this information we form a Markov chain as

follows. We take as states the kinds of weather R, N, and S. From the above information we determine the transition

probabilities. These are most conveniently represented in a square array as !

€

R N S

P =

RNS

1/2 1/4 1/41/2 0 1/21/4 1/4 1/2

"

#

$ $ $

%

&

' ' '

9/24/14 18 760 bayes & hmm

Hidden Markov Models A hidden Markov model (HMM) is a statistical model in which the

system being modeled is assumed to be a Markov process with unknown parameters, and the challenge is to determine the hidden parameters from the observable parameters. !

!A HMM can be considered as the simplest dynamic Bayesian

network. (DBN means the arcs are directed. If the arcs aren’t directed you have Markov Random Fields (MRFs) or Markov networks)!

!In a regular Markov model, the state is directly visible to the

observer, and therefore the state transition probabilities are the only parameters. !

!In a hidden Markov model, the state is not directly visible, but

variables influenced by the state are visible. !9/24/14 19 760 bayes & hmm

HMMs Continued Each state has a probability distribution over the

possible output tokens. !!Therefore the sequence of tokens generated by an

HMM gives some information about the sequence of states.!

!Hidden Markov models are especially known for their

application in temporal pattern recognition such as speech, handwriting, gesture recognition, musical score following and bioinformatics.!

9/24/14 20 760 bayes & hmm

Hidden Markov Model Graph

9/24/14 21 760 bayes & hmm

Trellis Layout

9/24/14 22 760 bayes & hmm

Probability of HMM The probability of observing a sequence Y = y(0),y(1),...,y(L − 1) of

length L is given by: !!!P(Y) = ΣX P(Y|X) P(X)!

!where the sum runs over all possible hidden node sequences X =

x(0),x(1),...,x(L − 1). !!Brute force calculation of P(Y) is intractable for most real-life

problems, as the number of possible hidden node sequences is going to be extremely high. !

!The calculation can however be sped up enormously using an

algorithm called the forward procedure.!9/24/14 23 760 bayes & hmm

HMM CS Definition A variant of a finite state machine having a set of states, Q,

an output alphabet, O, transition probabilities, A, output probabilities, B, and initial state probabilities, Π.

The current state is not observable. Instead, each state

produces an output with a certain probability (B). Usually the states, Q, and outputs, O, are understood, so an

HMM is said to be a triple, (A, B, Π).

9/24/14 24 760 bayes & hmm

HMM Formal CS definition A = {aij = P(qj at t+1 | qi at t)}, where P(x | y) is the

conditional probability of x given y, t ≥ 1 is time, and qi ∈ Q. �� Informally, A is the probability that the next state is qj

given that the current state is qi. B = {bik = P(ok | qi)}, where ok ∈ O.

Informally, B is the probability that the output is ok given that the current state is qi.

Π = {pi = P(qi at t=1)}. 9/24/14 25 760 bayes & hmm

HMM Formal Statistics Definition

States: A set of states S=s1,…,sn Transition probabilities: A= a1,1,a1,2,…,an,n Each ai,j represents

the probability of transitioning from state si to sj. Emission probabilities: A set B of functions of the form bi(ot)

which is the probability of observation ot being emitted by si

Initial state distribution: πi is the probability that si is a start

state

9/24/14 26 760 bayes & hmm

Hidden Markov Models Summary

Frequently, patterns do not appear in isolation but as part of a series in time - this progression can sometimes be used to assist in their recognition.

Assumptions are usually made about the time based process -

a common assumption is that the process's state is dependent only on the preceding N states - then we have an order N Markov model.

The simplest case is N=1, first-order Markov Model.

9/24/14 27 760 bayes & hmm

Uses for Hidden Markov Models

Various examples exists where the process states (patterns) are not directly observable, but are indirectly, and probabilistically, observable as another set of patterns - we can then define a hidden Markov model - these models have proved to be of great value in many current areas of research, notably speech recognition.

9/24/14 28 760 bayes & hmm

How to use HMMs Such models of real processes pose three problems that are

amenable to immediate attack; these are: Evaluation : with what probability does a given model generate a

given sequence of observations. The forward algorithm solves this problem efficiently.

Decoding : what sequence of hidden (underlying) states most

probably generated a given sequence of observations. The Viterbi algorithm solves this problem efficiently.

Learning : what model most probably underlies a given sample of

observation sequences - that is, what are the parameters of such a model. This problem may be solved by using the forward-backward algorithm (or Baum Welch)

9/24/14 29 760 bayes & hmm

Success of HMMs

HMMs have proved to be of great value in analysing real systems.

Their usual drawback is the over-

simplification associated with the Markov assumption that a state is dependent only on predecessors,

and that this dependence is time independent. 9/24/14 30 760 bayes & hmm

3 Main Problems - CS style There are three canonical problems associated with

HMMs:･!–  Evaluation: Given the parameters of the model, compute

the probability of a particular output sequence. This problem is solved by the forward algorithm.!

–  Decoding: Given the parameters of the model, find the most likely sequence of hidden states that could have generated a given output sequence. This problem is solved by the Viterbi algorithm.!

–  Learning: Given an output sequence or a set of such sequences, find the most likely set of state transition and output probabilities. In other words, train the parameters of the HMM given a dataset of sequences. This problem is solved by the Baum-Welch algorithm, forward-backward algorithm, EM algorithm.!

9/24/14 31 760 bayes & hmm

References •  CMSC 723: Introduction to Computational Linguistics - University of

Maryland Institute for Advanced Computer Studies -www.umiacs.umd.edu/~christof/courses/ cmsc723-fall04/lecture-notes/Lecture5-hmm.ppt

•  Leeds - http://www.comp.leeds.ac.uk/roger/HiddenMarkovModels •  Wikipedia - http://en.wikipedia.org/wiki/Markov_chain •  Nist - http://www.nist.gov/dads/HTML/markovchain.html •  Introduction to Probability - Charles M Grimstead J Laurie Snell -

http://www.dartmouth.edu/~chance/teaching_aids/books_articles/probability_book/Chapter11.pdf

•  2 J. G. Kemeny, J. L. Snell, G. L. Thompson, Introduction to Finite Mathematics, 3rd ed. (Englewood Cliffs, NJ: Prentice-Hall, 1974).

•  www.ncbioportal.org/ training/mod3/hiddenmark.html •  http://www.ncbioportal.org/training/mod3/hmmer.html

9/24/14 32 760 bayes & hmm

Documents

Hidden Markov Models - University of Auckland · Hidden Markov Models! A hidden Markov model (HMM) is a statistical model in which the ... problems, as the number of possible hidden