View
2
Download
0
Category
Preview:
Citation preview
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
Linear Dynamical Systems (Kalman filter)
(a) Overview of HMMs
(b) From HMMs to Linear Dynamical Systems (LDS)
1
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
π1 π2 π3 ππ
Letβs assume we have discrete random variables (e.g., taking 3 discrete
values ππ‘ = {100,010,001})
π(ππ‘ π1, . . , ππ‘β1 = π(ππ‘|ππ‘β1) Markov Property:
Stationary, Homogeneous or Time-Invariant if the distribution π ππ‘ ππ‘β1 does not depend on π‘
e.g. π(ππ‘ =100|ππ‘β1 =
010)
Markov Chains with Discrete Random Variables
2
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
π(π1, . . , ππ) = π(π1) π(ππ‘|ππ‘β1)
T
π‘=2
What do we need in order to describe the whole procedure?
(1) A probability for the first frame/timestamp etc π(π1). In order to
define the probability we need to define the vector π =(π1, π2, β¦ , πK)
(2) A transition probability π ππ|ππβπ . In order to define it we need a
πΎπ₯πΎ transition matrix π¨ = πππ
π π1|π = πππ₯1π
πΎ
π=1
π ππ‘|ππ‘β1, π¨ = ππππ₯π‘β1ππ₯π‘π
πΎ
π=1
πΎ
π=1
Markov Chains with Discrete Random Variables
3
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
(1) Using the transition matrix we can compute various probabilities regarding future
π ππ‘+1|ππ‘β1, π¨ = π¨2
π ππ‘+2|ππ‘β1, π¨ = π¨3
π ππ‘+π|ππ‘β1, π¨ = π¨π
(1) The stationary probability of a Markov Chain is very important (itβs an indication of how probable ending in one of states in random move) (Google Page Rank).
π ππ¨ = π π»
Markov Chains with Discrete Random Variables
4
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
Latent Variables in Markov Chain
π1
π1
π2
π2
π3
π3
ππ
ππ
π(ππ‘|ππ‘)
πΏπ₯πΎ emission
probability matrix
B
5
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
Latent Variables in a Markov Chain
π1
π1
π2
π2
π3
π3
ππ
ππ
6
π ππ‘ ππ‘ = π(ππ‘|ππ , πΊπ )π§π
πΎ
π=1
πΎ Gaussian distributions
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
π(π1, . . , ππ) = π(π1) π(ππ‘|ππ‘β1)
T
π‘=2
π πΏ, π π
= π π1, π2, β― , ππ , π1,π2, β― , ππ|π
= π(ππ‘|ππ‘)
π
π‘=1
π(π1) π(ππ‘|ππ‘β1)
T
π‘=2
Factorization of an HMM
7
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
Given a string of observations and parameters:
(1) We want to find for a timestamp π‘ the probabilities of ππ‘ given
the observations that far.
This process is called Filtering: π ππ‘ π1, π2, β¦ , ππ‘
(2) We want to find for a timestamp π‘ the probabilities of ππ‘ given
the whole string.
This process is called Smoothing: π ππ‘ π1, π2, β¦ , ππ
8
What can we do with an HMM ?
(3) Given the observation string find the string of hidden variables that
maximize the posterior.
argmaxπ1β¦ ππ‘ π π1, π2, β¦ , ππ‘ ππ, ππ, β¦ , ππ
This process is called Decoding (Viterbi).
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
Hidden Markov Models
Filtering Smoothing Decoding
9
Taken from Machine Learning: A Probabilistic Perspective by K. Murphy
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
(4) Find the probability of the model.
π(π1, π2, β¦ , ππ)
This process is called Evaluation
Hidden Markov Models
10
(5) Prediction
π ππ‘+πΏ π1, π2, β¦ , ππ π ππ‘+πΏ π1, π2, β¦ , ππ
(6) EM Parameter estimation (Baum-Welch algorithm)
π¨,π , π
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
π ππ‘ π1, π2, β¦ , ππ‘
π ππ‘ π1, π2, β¦ , ππ
argmaxπ§1β¦ π§π‘ π π1, β¦ , ππ‘ ππ, β¦ , ππ
π ππ‘βπ π1, π2, β¦ , ππ
π ππ‘+πΏ π1, π2, β¦ , ππ
π ππ‘+πΏ π1, π2, β¦ , ππ
Hidden Markov Models
11
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
β’ Up until now we had Markov Chains with discrete variables
π1 π2 π3 ππ
Linear Dynamical Systems (LDS)
β’ How can define a transition relationship with continuous valued
variables?
ππ‘ = π¨ππ‘β1 + ππ‘ π1 = π0 + π
π~π(π|π, πͺ) π~π π π,π·0
or
π1~π(π1|π0, π·0) ππ‘~π(ππ‘|π¨ππ‘β1, πͺ) or
π(π1) = π(π1|π0, π·0) π(ππ‘ ππ‘β1 = π(π¨ππ‘β1|π, πͺ)
12
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
Latent Variable Models (Dynamic, Continuous)
13
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
π1 π2 π3
π1 π2 π3 ππ
Share a common linear structure
We want to find the parameters:
ππ ππ
14
Linear Dynamical Systems (LDS)
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
π = {πΎ,π¨, π0, πΊ, πͺ, π·0}
π1 π2 π3 ππ
π1 π2 π3 ππ
ππ‘ = πππ‘ + ππ‘
ππ‘ = π¨ππ‘β1 + ππ‘
π1 = π0 + π
π~π(π|π, πΊ)
π~π(π|π, πͺ)
π~π π π,π·0
Parameters:
15
Transition model
Linear Dynamical Systems (LDS)
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
π1 π2 π3 ππ
π1 π2 π3 ππ
Linear Dynamical Systems (LDS)
π(π1) = π(π1|π0, π·0)
π(ππ‘ ππ‘β1 = π(ππ‘|π¨ππ‘β1, πͺ)
π(ππ‘ ππ‘ = π(ππ‘|πΎππ‘ , πΊ) Emission:
Transition Probability :
First timestamp:
16
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
HMM LDS
Markov Chain
with discrete latent variables
Markov Chain
with continuous latent variables
π ππ|ππβπ
π π ππ
π ππ|ππ
π¨
π©
πΎ distributions π ππ|ππ
or
πΏπ₯πΎ
πΎπ₯πΎ
πΎπ₯1 π(π1) = π(π1|π0, π·0)
π(ππ‘ ππ‘β1 = π(π¨ππ‘β1|π, πͺ)
π(ππ‘ ππ‘ = π(ππ‘|πΎππ‘ , πΊ)
HMM vs LDS
17
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
What can we do with LDS?
Global Positioning System (GPS)
18
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
{π₯π‘ , π¦π‘, π§π‘}
clock
What can we do with LDS?
19
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
Noisy measurements
Filtered path
Noisy path
Actual path
We still have noisy measurements
What can we do with LDS?
20
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
What can we do with LDS?
Given a string of observations and parameters:
(1) We want to find for a timestamp π‘ the probabilities of ππ‘ given
the observations that far.
This process is called Filtering: π ππ‘ π1, π2, β¦ , ππ‘
(2) We want to find for a timestamp π‘ the probabilities of ππ‘ given
the whole time series.
This process is called Smoothing: π ππ‘ π1, π2, β¦ , ππ
All above probabilities are Gaussians. Means and covariance matrices
are computed recursively!!!!
21
Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)
(5) Prediction
π ππ‘+πΏ π1, π2, β¦ , ππ π ππ‘+πΏ π1, π2, β¦ , ππ
(6) EM Parameter estimation (Baum-Welch algorithm)
π = {πΎ,π¨, π0, πΊ, πͺ, π·0}
What can we do with LDS?
(3) Find the probability of the model.
π(π1, π2, β¦ , ππ) This process is called Evaluation:
22
Recommended