Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Statistical MachineLearning
c©2020Ong & Walder & Webers
Data61 | CSIROThe Australian National
University
Outlines
OverviewIntroductionLinear Algebra
Probability
Linear Regression 1
Linear Regression 2
Linear Classification 1
Linear Classification 2
Kernel MethodsSparse Kernel Methods
Mixture Models and EM 1Mixture Models and EM 2Neural Networks 1Neural Networks 2Principal Component Analysis
AutoencodersGraphical Models 1
Graphical Models 2
Graphical Models 3
Sampling
Sequential Data 1
Sequential Data 2
1of 821
Statistical Machine Learning
Christian Walder
Machine Learning Research GroupCSIRO Data61
and
College of Engineering and Computer ScienceThe Australian National University
CanberraSemester One, 2020.
(Many figures from C. M. Bishop, "Pattern Recognition and Machine Learning")
Statistical MachineLearning
c©2020Ong & Walder & Webers
Data61 | CSIROThe Australian National
University
A Simple Example
Maximum Likelihood forHMM
Forward-BackwardHMM
ConditionalIndependence
Alpha-Beta HMM
How to train a HMMusing EM?
HMM - Viterbi Algorithm
762of 821
Part XXII
Sequential Data 2
Statistical MachineLearning
c©2020Ong & Walder & Webers
Data61 | CSIROThe Australian National
University
A Simple Example
Maximum Likelihood forHMM
Forward-BackwardHMM
ConditionalIndependence
Alpha-Beta HMM
How to train a HMMusing EM?
HMM - Viterbi Algorithm
763of 821
Example of a Hidden Markov Model
Assume Peter and Mary are students in Canberra and Sydney,respectively. Peter is a computer science student and onlyinterested in riding his bicycle, shopping for new computergadgets, and studying. (Well, he also does other things butbecause these other activities don’t depend on the weather weneglect them here.)Mary does not know the current weather in Canberra, butknows the general trends of the weather in Canberra. She alsoknows Peter well enough to know what he does on average onrainy days and also when the skies are blue.She believes that the weather (rainy or not rainy) follows agiven discrete Markov chain. She tries to guess the sequenceof weather patterns for a number of days after Peter tells her onthe phone what he did in the last days.
Statistical MachineLearning
c©2020Ong & Walder & Webers
Data61 | CSIROThe Australian National
University
A Simple Example
Maximum Likelihood forHMM
Forward-BackwardHMM
ConditionalIndependence
Alpha-Beta HMM
How to train a HMMusing EM?
HMM - Viterbi Algorithm
764of 821
Example of a Hidden Markov Model
Mary uses the following HMM
rainy sunnyinitial probability 0.2 0.8
transition probability rainy 0.3 0.7sunny 0.4 0.6
emission probability cycle 0.1 0.6shop 0.4 0.3study 0.5 0.1
Assume, Peter tells Mary that the list of his activities in the lastdays was [′cycle′,′ shop′,′ study′]
(a) Calculate the probability of this observation sequence.(b) Calculate the most probable sequence of hidden states for
these observations.
Statistical MachineLearning
c©2020Ong & Walder & Webers
Data61 | CSIROThe Australian National
University
A Simple Example
Maximum Likelihood forHMM
Forward-BackwardHMM
ConditionalIndependence
Alpha-Beta HMM
How to train a HMMusing EM?
HMM - Viterbi Algorithm
765of 821
Hidden Markov Model
The trellis for the hidden Markov modelFind the probability of ’cycle’, ’shop’, ’study’
Rainy0.2
Sunny0.8
Rainy
Sunny
0.4
0.7
0.3
0.6
Rainy
Sunny
0.4
0.7
0.3
0.6
cycle
shop
study
0.50.4
0.1 0.60.3
0.1cycle
shop
study
0.50.4
0.1 0.60.3
0.1cycle
shop
study
0.50.4
0.1 0.60.3
0.1
Statistical MachineLearning
c©2020Ong & Walder & Webers
Data61 | CSIROThe Australian National
University
A Simple Example
Maximum Likelihood forHMM
Forward-BackwardHMM
ConditionalIndependence
Alpha-Beta HMM
How to train a HMMusing EM?
HMM - Viterbi Algorithm
766of 821
Hidden Markov Model
Find the most probable hidden states for ’cycle’, ’shop’,’study’
Rainy0.2
Sunny0.8
Rainy
Sunny
0.4
0.7
0.3
0.6
Rainy
Sunny
0.4
0.7
0.3
0.6
cycle
shop
study
0.50.4
0.1 0.60.3
0.1cycle
shop
study
0.50.4
0.1 0.60.3
0.1cycle
shop
study
0.50.4
0.1 0.60.3
0.1cycle
shop
study
Statistical MachineLearning
c©2020Ong & Walder & Webers
Data61 | CSIROThe Australian National
University
A Simple Example
Maximum Likelihood forHMM
Forward-BackwardHMM
ConditionalIndependence
Alpha-Beta HMM
How to train a HMMusing EM?
HMM - Viterbi Algorithm
767of 821
Homogeneous Hidden Markov Model
Joint probability distribution over both latent and observedvariables
p(X,Z |θ) = p(z1 |π)
[N∏
n=2
p(zn | zn−1,A)
]N∏
m=1
p(xm | zm, φ)
where X = (x1, . . . , xN), Z = (z1, . . . , zN), andθ = {π,A, φ}.Most of the discussion will be independent of the particularchoice of emission probabilities (e.g. discrete tables,Gaussians, mixture of Gaussians).
Statistical MachineLearning
c©2020Ong & Walder & Webers
Data61 | CSIROThe Australian National
University
A Simple Example
Maximum Likelihood forHMM
Forward-BackwardHMM
ConditionalIndependence
Alpha-Beta HMM
How to train a HMMusing EM?
HMM - Viterbi Algorithm
768of 821
Maximum Likelihood for HMM
We have observed a data set X = (x1, . . . , xn).Assume it came from a HMM with a given structure(number of nodes, form of emission probabilities).The likelihood of the data is
p(X |θ) =∑
Z
p(X,Z |θ)
This joint distribution does not factorise over n (as with themixture distribution).We have N variables, each with K states : KN terms.Number of terms grows exponentially with the length of thechain.But we can use the conditional independence of the latentvariables to reorder their calculation later.Further obstacle to find a closed loop maximum likelihoodsolution: calculating the emission probabilities for differentstates zn.
Statistical MachineLearning
c©2020Ong & Walder & Webers
Data61 | CSIROThe Australian National
University
A Simple Example
Maximum Likelihood forHMM
Forward-BackwardHMM
ConditionalIndependence
Alpha-Beta HMM
How to train a HMMusing EM?
HMM - Viterbi Algorithm
769of 821
Maximum Likelihood for HMM - EM
Employ the EM algorithm to find the Maximum Likelihoodfor HMM.
Start with some initial parameter settings θold.E-step: Find the posterior distribution of the latentvariables p(Z |X,θold).M-step: Maximise
Q(θ,θold) =∑
Z
p(Z |X,θold) ln p(Z,X |θ)
with respect to the parameters θ = {π,A, φ}.
Statistical MachineLearning
c©2020Ong & Walder & Webers
Data61 | CSIROThe Australian National
University
A Simple Example
Maximum Likelihood forHMM
Forward-BackwardHMM
ConditionalIndependence
Alpha-Beta HMM
How to train a HMMusing EM?
HMM - Viterbi Algorithm
770of 821
Maximum Likelihood for HMM - EM
Employ the EM algorithm to find the Maximum Likelihoodfor HMM.Start with some initial parameter settings θold.
E-step: Find the posterior distribution of the latentvariables p(Z |X,θold).M-step: Maximise
Q(θ,θold) =∑
Z
p(Z |X,θold) ln p(Z,X |θ)
with respect to the parameters θ = {π,A, φ}.
Statistical MachineLearning
c©2020Ong & Walder & Webers
Data61 | CSIROThe Australian National
University
A Simple Example
Maximum Likelihood forHMM
Forward-BackwardHMM
ConditionalIndependence
Alpha-Beta HMM
How to train a HMMusing EM?
HMM - Viterbi Algorithm
771of 821
Maximum Likelihood for HMM - EM
Employ the EM algorithm to find the Maximum Likelihoodfor HMM.Start with some initial parameter settings θold.E-step: Find the posterior distribution of the latentvariables p(Z |X,θold).
M-step: Maximise
Q(θ,θold) =∑
Z
p(Z |X,θold) ln p(Z,X |θ)
with respect to the parameters θ = {π,A, φ}.
Statistical MachineLearning
c©2020Ong & Walder & Webers
Data61 | CSIROThe Australian National
University
A Simple Example
Maximum Likelihood forHMM
Forward-BackwardHMM
ConditionalIndependence
Alpha-Beta HMM
How to train a HMMusing EM?
HMM - Viterbi Algorithm
772of 821
Maximum Likelihood for HMM - EM
Employ the EM algorithm to find the Maximum Likelihoodfor HMM.Start with some initial parameter settings θold.E-step: Find the posterior distribution of the latentvariables p(Z |X,θold).M-step: Maximise
Q(θ,θold) =∑
Z
p(Z |X,θold) ln p(Z,X |θ)
with respect to the parameters θ = {π,A, φ}.
Statistical MachineLearning
c©2020Ong & Walder & Webers
Data61 | CSIROThe Australian National
University
A Simple Example
Maximum Likelihood forHMM
Forward-BackwardHMM
ConditionalIndependence
Alpha-Beta HMM
How to train a HMMusing EM?
HMM - Viterbi Algorithm
773of 821
Maximum Likelihood for HMM - EM
Denote the marginal posterior distribution of zn by γ(zn),andthe joint posterior distribution of two successive latentvariables by ξ(zn−1, zn)
γ(zn) = p(zn |X,θold)
ξ(zn−1, zn) = p(zn−1, zn |X,θold).
For each step n, γ(zn) has K nonnegative values whichsum to 1.For each step n, ξ(zn−1, zn) has K × K nonnegative valueswhich sum to 1.Elements of these vectors are denoted by γ(znk) andξ(zn−1,j, znk) respectively.
Statistical MachineLearning
c©2020Ong & Walder & Webers
Data61 | CSIROThe Australian National
University
A Simple Example
Maximum Likelihood forHMM
Forward-BackwardHMM
ConditionalIndependence
Alpha-Beta HMM
How to train a HMMusing EM?
HMM - Viterbi Algorithm
774of 821
Maximum Likelihood for HMM - EM
Because the expectation of a binary random variable is theprobability that it is one, we get with this notation
γ(znk) = E [znk] =∑
zn
γ(zn)znk
ξ(zn−1,j, znk) = E [zn−1,j znk] =∑
zn−1,zn
ξ(zn−1, zn)zn−1,j znk.
Putting all together we get
Q(θ,θold) =
K∑k=1
γ(z1k) lnπk +
N∑n=2
K∑j=1
K∑k=1
ξ(zn−1,j, znk) lnAjk
+
N∑n=1
K∑k=1
γ(znk) ln p(xn |φk).
Statistical MachineLearning
c©2020Ong & Walder & Webers
Data61 | CSIROThe Australian National
University
A Simple Example
Maximum Likelihood forHMM
Forward-BackwardHMM
ConditionalIndependence
Alpha-Beta HMM
How to train a HMMusing EM?
HMM - Viterbi Algorithm
775of 821
Maximum Likelihood for HMM - EM
M-step: Maximising
Q(θ,θold) =
K∑k=1
γ(z1k) lnπk +
N∑n=2
K∑j=1
K∑k=1
ξ(zn−1,j, znk) lnAjk
+
N∑n=1
K∑k=1
γ(znk) ln p(xn |φk).
results in
πk =γ(z1k)∑Kj=1 γ(z1j)
Ajk =
∑Nn=2 ξ(zn−1,j, znk)∑K
l=1∑N
n=2 ξ(zn−1,j, znl)
Statistical MachineLearning
c©2020Ong & Walder & Webers
Data61 | CSIROThe Australian National
University
A Simple Example
Maximum Likelihood forHMM
Forward-BackwardHMM
ConditionalIndependence
Alpha-Beta HMM
How to train a HMMusing EM?
HMM - Viterbi Algorithm
776of 821
Maximum Likelihood for HMM - EM
Still left: Maximising with respect to φ.
Q(θ,θold) =
K∑k=1
γ(z1k) lnπk +
N∑n=2
K∑j=1
K∑k=1
ξ(zn−1,j, znk) lnAjk
+
N∑n=1
K∑k=1
γ(znk) ln p(xn |φk).
But φ only appears in the last term, and under theassumption that all the φk are independent of each other,this term decouples into a sum.Then maximise each contribution
∑Nn=1 γ(znk) ln p(xn |φk)
individually.
Statistical MachineLearning
c©2020Ong & Walder & Webers
Data61 | CSIROThe Australian National
University
A Simple Example
Maximum Likelihood forHMM
Forward-BackwardHMM
ConditionalIndependence
Alpha-Beta HMM
How to train a HMMusing EM?
HMM - Viterbi Algorithm
777of 821
Maximum Likelihood for HMM - EM
In the case of Gaussian emission densities
p(x |φk) = N (x |µk,Σk),
we get for the maximising parameters for the emissiondensities
µk =
∑Nn=1 γ(znk) xn∑N
n=1 γ(znk)
Σk =
∑Nn=1 γ(znk) (xn − µk)(xn − µk)
T∑Nn=1 γ(znk)
.
Statistical MachineLearning
c©2020Ong & Walder & Webers
Data61 | CSIROThe Australian National
University
A Simple Example
Maximum Likelihood forHMM
Forward-BackwardHMM
ConditionalIndependence
Alpha-Beta HMM
How to train a HMMusing EM?
HMM - Viterbi Algorithm
778of 821
Forward-Backward HMM
Need to efficiently evaluate the γ(znk) and ξ(zn−1,j, znk).The graphical model for the HMM is a tree!We know we can use a two-stage message passingalgorithm to calculate the posterior distribution of the latentvariables.For HMM this is called the forward-backward algorithm(Rabiner, 1989), or Baum-Welch algorithm (Baum, 1972).Other variants exist, different only in the form of themessages propagated.We look at the most widely known, the alpha-betaalgorithm.
Statistical MachineLearning
c©2020Ong & Walder & Webers
Data61 | CSIROThe Australian National
University
A Simple Example
Maximum Likelihood forHMM
Forward-BackwardHMM
ConditionalIndependence
Alpha-Beta HMM
How to train a HMMusing EM?
HMM - Viterbi Algorithm
779of 821
Conditional Independence for HMM
Given the data X = {x1, . . . , xN} the following independencerelations hold:
p(X | zn) = p(x1, . . . , xn | zn) p(xn+1, . . . , xN | zn)
p(x1, . . . , xn−1 | xn, zn) = p(x1, . . . , xn−1 | zn)
p(x1, . . . , xn−1 | zn−1, zn) = p(x1, . . . , xn−1 | zn−1)
p(xn+1, . . . , xN | zn, zn+1) = p(xn+1, . . . , xN | zn+1)
p(xn+2, . . . , xN | xn+1, zn+1) = p(xn+2, . . . , xN | zn+1)
p(X | zn−1, zn) = p(x1, . . . , xn−1 | zn−1) p(xn | zn)
p(xn+1, . . . , xN | zn)
p(xN+1 |X, zN+1) = p(xN+1 | zN+1)
p(zn+1 |X, zN) = p(zN+1 | zN)
zn−1 zn zn+1
xn−1 xn xn+1
z1 z2
x1 x2
Statistical MachineLearning
c©2020Ong & Walder & Webers
Data61 | CSIROThe Australian National
University
A Simple Example
Maximum Likelihood forHMM
Forward-BackwardHMM
ConditionalIndependence
Alpha-Beta HMM
How to train a HMMusing EM?
HMM - Viterbi Algorithm
780of 821
Conditional Independence for HMM - Example
Let’s look at the following independence relation:
p(X | zn) = p(x1, . . . , xn | zn) p(xn+1, . . . , xN | zn)
Any path from the set {x1, . . . , xn} to the set {xn+1, . . . , xN}has to go through zn.In p(X | zn) the node zn is conditioned on (= observed).All paths from x1, . . . , xn−1 through zn to xn+1, . . . , xN arehead-tail.The path from xn through zn to zn+1 is tail-tail, so blocked.Therefore x1, . . . , xn ⊥⊥ xn+1, . . . , xN | zn.
zn−1 zn zn+1
xn−1 xn xn+1
z1 z2
x1 x2
Statistical MachineLearning
c©2020Ong & Walder & Webers
Data61 | CSIROThe Australian National
University
A Simple Example
Maximum Likelihood forHMM
Forward-BackwardHMM
ConditionalIndependence
Alpha-Beta HMM
How to train a HMMusing EM?
HMM - Viterbi Algorithm
781of 821
Alpha-Beta HMM
Define the joint probability of observing all data up to stepn and having zn as latent variable to be
α(zn) = p(x1, . . . , xn, zn).
Define the probability of all future data given zn to be
β(zn) = p(xn+1, . . . , xN | zn).
Then it an be shown the following recursions hold
α(zn) = p(xn | zn)∑zn−1
α(zn−1) p(zn | zn−1)
α(z1) =K∏
k=1
{πk p(x1 |φk)}z1k
Statistical MachineLearning
c©2020Ong & Walder & Webers
Data61 | CSIROThe Australian National
University
A Simple Example
Maximum Likelihood forHMM
Forward-BackwardHMM
ConditionalIndependence
Alpha-Beta HMM
How to train a HMMusing EM?
HMM - Viterbi Algorithm
782of 821
Alpha-Beta HMM
At step n we can efficiently calculate α(zn) given α(zn−1)
α(zn) = p(xn | zn)∑zn−1
α(zn−1) p(zn | zn−1)
k = 1
k = 2
k = 3
n− 1 n
α(zn−1,1)
α(zn−1,2)
α(zn−1,3)
α(zn,1)A11
A21
A31
p(xn|zn,1)
Statistical MachineLearning
c©2020Ong & Walder & Webers
Data61 | CSIROThe Australian National
University
A Simple Example
Maximum Likelihood forHMM
Forward-BackwardHMM
ConditionalIndependence
Alpha-Beta HMM
How to train a HMMusing EM?
HMM - Viterbi Algorithm
783of 821
Alpha-Beta HMM
And for β(zn) we get the recursion
β(zn) =∑zn+1
β(zn+1) p(xn+1 | zn+1) p(zn+1 | zn)
k = 1
k = 2
k = 3
n n+ 1
β(zn,1) β(zn+1,1)
β(zn+1,2)
β(zn+1,3)
A11
A12
A13
p(xn|zn+1,1)
p(xn|zn+1,2)
p(xn|zn+1,3)
Statistical MachineLearning
c©2020Ong & Walder & Webers
Data61 | CSIROThe Australian National
University
A Simple Example
Maximum Likelihood forHMM
Forward-BackwardHMM
ConditionalIndependence
Alpha-Beta HMM
How to train a HMMusing EM?
HMM - Viterbi Algorithm
784of 821
Alpha-Beta HMM
How do we start the β recursion? What is β(zN) ?
β(zN) = p(xN+1, . . . , xN | zn).
Can be shown the following is consistent with theapproach
β(zN) = 1.
k = 1
k = 2
k = 3
n n+ 1
β(zn,1) β(zn+1,1)
β(zn+1,2)
β(zn+1,3)
A11
A12
A13
p(xn|zn+1,1)
p(xn|zn+1,2)
p(xn|zn+1,3)
Statistical MachineLearning
c©2020Ong & Walder & Webers
Data61 | CSIROThe Australian National
University
A Simple Example
Maximum Likelihood forHMM
Forward-BackwardHMM
ConditionalIndependence
Alpha-Beta HMM
How to train a HMMusing EM?
HMM - Viterbi Algorithm
785of 821
Alpha-Beta HMM
Now we know how to calculate α(zn) and β(zn) for eachstep.What is the probability of the data p(X) ?Use the definition of γ(zn) and Bayes
γ(zn) = p(zn |X) =p(X | zn) p(zn)
p(X)=
p(X, zn)
p(X)
and the following conditional independence statementfrom the graphical model of the HMM
p(X | zn) = p(x1, . . . , xn | zn)︸ ︷︷ ︸α(zn)/p(zn)
p(xn+1, . . . , xN | zn)︸ ︷︷ ︸β(zn)
and thereforeγ(zn) =
α(zn)β(zn)
p(X)
Statistical MachineLearning
c©2020Ong & Walder & Webers
Data61 | CSIROThe Australian National
University
A Simple Example
Maximum Likelihood forHMM
Forward-BackwardHMM
ConditionalIndependence
Alpha-Beta HMM
How to train a HMMusing EM?
HMM - Viterbi Algorithm
786of 821
Alpha-Beta HMM
Marginalising over zn results in
1 =∑
zn
γ(zn) =
∑znα(zn)β(zn)
p(X)
and therefore at each step n
p(X) =∑
zn
α(zn)β(zn)
Most conveniently evaluated at step N where β(zN) = 1 as
p(X) =∑
zN
α(zn).
Statistical MachineLearning
c©2020Ong & Walder & Webers
Data61 | CSIROThe Australian National
University
A Simple Example
Maximum Likelihood forHMM
Forward-BackwardHMM
ConditionalIndependence
Alpha-Beta HMM
How to train a HMMusing EM?
HMM - Viterbi Algorithm
787of 821
Alpha-Beta HMM
Finally, we need to calculate the joint posterior distributionof two successive latent variables by ξ(zn−1, zn) defined by
ξ(zn−1, zn) = p(zn−1, zn |X).
This can be done directly from the α and β values in theform
ξ(zn−1, zn) =α(zn−1) p(xn | zn) p(zn | zn−1)β(zn)
p(X)
Statistical MachineLearning
c©2020Ong & Walder & Webers
Data61 | CSIROThe Australian National
University
A Simple Example
Maximum Likelihood forHMM
Forward-BackwardHMM
ConditionalIndependence
Alpha-Beta HMM
How to train a HMMusing EM?
HMM - Viterbi Algorithm
788of 821
How to train a HMM using EM?
1 Make an inital selection for the parameters θold whereθ = {π,A, φ)}. (Often, A,π initialised uniformly orrandomly. The φk-initialisation depends on the emissiondistribution; for Gaussians run K-means first and get µkand Σk from there.)
2 (Start of E-step) Run forward recursion to calculate α(zn).3 Run backward recursion to calculate β(zn).4 Calculate γ(z) and ξ(zn−1, zn) from α(zn) and β(zn).5 Evaluate the likelihood p(X).6 (Start of M-step) Find a θnew maximising Q(θ,θold). This
results in new settings for the parameters πk,Ajk and φk asdescribed before.
7 Iterate until convergence is detected.
Statistical MachineLearning
c©2020Ong & Walder & Webers
Data61 | CSIROThe Australian National
University
A Simple Example
Maximum Likelihood forHMM
Forward-BackwardHMM
ConditionalIndependence
Alpha-Beta HMM
How to train a HMMusing EM?
HMM - Viterbi Algorithm
789of 821
Alpha-Beta HMM - Notes
In order to calculate the likelihood, we need to use the jointprobability p(X,Z) and sum over all possible values of Z.That means, every particular choice of Z corresponds toone path through the lattice diagram. There areexponentially many !Using the alpha-beta algorithm, the exponential cost hasbeen reduced to a linear cost in the length of the model.How did we do that?Swapping the order of multiplication and summation.
Statistical MachineLearning
c©2020Ong & Walder & Webers
Data61 | CSIROThe Australian National
University
A Simple Example
Maximum Likelihood forHMM
Forward-BackwardHMM
ConditionalIndependence
Alpha-Beta HMM
How to train a HMMusing EM?
HMM - Viterbi Algorithm
790of 821
HMM - Viterbi Algorithm
Motivation: The latent states can have some meaningfulinterpretation, e.g. phonemes in a speech recognitionsystem where the observed variables are the acousticsignals.Goal: After the system has been trained, find the mostprobable states of the latent variables for a givensequence of observations.Warning: Finding the set of states which are eachindividually the most probable does NOT solve thisproblem.
Statistical MachineLearning
c©2020Ong & Walder & Webers
Data61 | CSIROThe Australian National
University
A Simple Example
Maximum Likelihood forHMM
Forward-BackwardHMM
ConditionalIndependence
Alpha-Beta HMM
How to train a HMMusing EM?
HMM - Viterbi Algorithm
791of 821
HMM - Viterbi Algorithm
Define ω(zn) = maxz1,...,zn−1 ln p(x1, . . . , xn, z1, . . . , zn)
From the joint distribution of the HMM given by
p(x1, . . . , xN , z1, . . . , zN) = p(z1)
[N∏
n=2
p(zn | zn−1)
]N∏
n=1
p(xn | zn)
the following recursion can be derived
ω(zn) = ln p(xn | zn) + maxzn−1{ln p(zn | zn−1) + ω(zn−1)}
ω(z1) = ln p(z1) + ln p(x1 | z1) = ln p(x1, z1)
k = 1
k = 2
k = 3
n− 2 n− 1 n n + 1
Statistical MachineLearning
c©2020Ong & Walder & Webers
Data61 | CSIROThe Australian National
University
A Simple Example
Maximum Likelihood forHMM
Forward-BackwardHMM
ConditionalIndependence
Alpha-Beta HMM
How to train a HMMusing EM?
HMM - Viterbi Algorithm
792of 821
HMM - Viterbi Algorithm
Calculate
ω(zn) = maxz1,...,zn−1
ln p(x1, . . . , xn, z1, . . . , zn)
for n = 1, . . . ,N.For each step n remember which is the best transition togo into each state at the next step.At step N : Find the state with the highest probability.For n = 1, . . . ,N − 1: Backtrace which transition led to themost probable state and identify from which state it came.
k = 1
k = 2
k = 3
n− 2 n− 1 n n + 1