Linear Dynamical Systems (Kalman filter) · Stefanos Zafeiriou Adv. Statistical Machine Learning...

Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495)

Linear Dynamical Systems (Kalman filter)

(a) Overview of HMMs

(b) From HMMs to Linear Dynamical Systems (LDS)

𝒙1 𝒙2 𝒙3 𝒙𝑇

Let’s assume we have discrete random variables (e.g., taking 3 discrete

values 𝒙𝑡 = {100,010,001})

𝑝(𝒙𝑡 𝒙1, . . , 𝒙𝑡−1 = 𝑝(𝒙𝑡|𝒙𝑡−1) Markov Property:

Stationary, Homogeneous or Time-Invariant if the distribution 𝑝 𝒙𝑡 𝒙𝑡−1 does not depend on 𝑡

e.g. 𝑝(𝒙𝑡 =100|𝒙𝑡−1 =

Markov Chains with Discrete Random Variables

𝑝(𝒙1, . . , 𝒙𝑇) = 𝑝(𝒙1) 𝑝(𝒙𝑡|𝒙𝑡−1)

𝑡=2

What do we need in order to describe the whole procedure?

(1) A probability for the first frame/timestamp etc 𝑝(𝒙1). In order to

define the probability we need to define the vector 𝝅 =(𝜋1, 𝜋2, … , 𝜋K)

(2) A transition probability 𝑝 𝒙𝒕|𝒙𝒕−𝟏 . In order to define it we need a

𝐾𝑥𝐾 transition matrix 𝑨 = 𝑎𝑖𝑗

𝑝 𝒙1|𝝅 = 𝜋𝑐𝑥1𝒄

𝑐=1

𝑝 𝒙𝑡|𝒙𝑡−1, 𝑨 = 𝑎𝑗𝑘𝑥𝑡−1𝑗𝑥𝑡𝑘

𝑘=1

𝑗=1

(1) Using the transition matrix we can compute various probabilities regarding future

𝑝 𝒙𝑡+1|𝒙𝑡−1, 𝑨 = 𝑨2

𝑝 𝒙𝑡+2|𝒙𝑡−1, 𝑨 = 𝑨3

𝑝 𝒙𝑡+𝑛|𝒙𝑡−1, 𝑨 = 𝑨𝑛

(1) The stationary probability of a Markov Chain is very important (it’s an indication of how probable ending in one of states in random move) (Google Page Rank).

𝝅𝐓𝑨 = 𝝅𝑻

Latent Variables in Markov Chain

𝒛𝑇

𝒙𝑇

𝑝(𝒙𝑡|𝒛𝑡)

𝐿𝑥𝐾 emission

probability matrix

Latent Variables in a Markov Chain

𝒛𝑇

𝒙𝑇

𝑝 𝒙𝑡 𝒛𝑡 = 𝑁(𝒙𝑡|𝝁𝜅 , 𝚺𝜅)𝑧𝑘

𝑘=1

𝐾 Gaussian distributions

𝑝(𝒛1, . . , 𝒛𝑇) = 𝑝(𝒛1) 𝑝(𝒛𝑡|𝒛𝑡−1)

𝑡=2

𝑝 𝑿, 𝒁 𝜃

= 𝑝 𝒙1, 𝒙2, ⋯ , 𝒙𝑇 , 𝒛1,𝒛2, ⋯ , 𝒛𝑇|𝜃

= 𝑝(𝒙𝑡|𝒛𝑡)

𝑡=1

𝑝(𝒛1) 𝑝(𝒛𝑡|𝒛𝑡−1)

𝑡=2

Factorization of an HMM

Given a string of observations and parameters:

(1) We want to find for a timestamp 𝑡 the probabilities of 𝒛𝑡 given

the observations that far.

This process is called Filtering: 𝑝 𝒛𝑡 𝒙1, 𝒙2, … , 𝒙𝑡

the whole string.

This process is called Smoothing: 𝑝 𝒛𝑡 𝒙1, 𝒙2, … , 𝒙𝑇

What can we do with an HMM ?

(3) Given the observation string find the string of hidden variables that

maximize the posterior.

argmax𝒛1… 𝒛𝑡 𝑝 𝒛1, 𝒛2, … , 𝒛𝑡 𝒙𝟏, 𝒙𝟐, … , 𝒙𝒕

This process is called Decoding (Viterbi).

Hidden Markov Models

Filtering Smoothing Decoding

Taken from Machine Learning: A Probabilistic Perspective by K. Murphy

(4) Find the probability of the model.

𝑝(𝒙1, 𝒙2, … , 𝒙𝑇)

This process is called Evaluation

(5) Prediction

𝑝 𝒛𝑡+𝛿 𝒙1, 𝒙2, … , 𝒙𝒕 𝑝 𝒙𝑡+𝛿 𝒙1, 𝒙2, … , 𝒙𝒕

(6) EM Parameter estimation (Baum-Welch algorithm)

𝑨,𝝅, 𝜃

𝑝 𝒛𝑡 𝒙1, 𝒙2, … , 𝒙𝑡

𝑝 𝒛𝑡 𝒙1, 𝒙2, … , 𝒙𝑇

argmax𝑧1… 𝑧𝑡 𝑝 𝒛1, … , 𝒛𝑡 𝒙𝟏, … , 𝒙𝒕

𝑝 𝒛𝑡−𝜏 𝒙1, 𝒙2, … , 𝒙𝒕

𝑝 𝒛𝑡+𝛿 𝒙1, 𝒙2, … , 𝒙𝒕

𝑝 𝒙𝑡+𝛿 𝒙1, 𝒙2, … , 𝒙𝒕

• Up until now we had Markov Chains with discrete variables

Linear Dynamical Systems (LDS)

• How can define a transition relationship with continuous valued

variables?

𝒙𝑡 = 𝑨𝒙𝑡−1 + 𝒗𝑡 𝒙1 = 𝝁0 + 𝒖

𝒗~𝑁(𝒗|𝟎, 𝚪) 𝒖~𝑁 𝒖 𝟎,𝑷0

𝒙1~𝑁(𝒙1|𝝁0, 𝑷0) 𝒙𝑡~𝑁(𝒙𝑡|𝑨𝒙𝑡−1, 𝚪) or

𝑝(𝒙1) = 𝑁(𝒙1|𝝁0, 𝑷0) 𝒑(𝒙𝑡 𝒙𝑡−1 = 𝑁(𝑨𝒙𝑡−1|𝟎, 𝚪)

Latent Variable Models (Dynamic, Continuous)

𝒙1 𝒙2 𝒙3

𝒚1 𝒚2 𝒚3 𝒚𝑁

Share a common linear structure

We want to find the parameters:

𝒙𝑁 𝒙𝑁

𝜃 = {𝑾,𝑨, 𝝁0, 𝚺, 𝚪, 𝑷0}

𝒚1 𝒚2 𝒚3 𝒚𝑇

𝒙𝑡 = 𝐖𝒚𝑡 + 𝒆𝑡

𝒚𝑡 = 𝑨𝒚𝑡−1 + 𝒗𝑡

𝒚1 = 𝝁0 + 𝒖

𝐞~𝑁(𝒆|𝟎, 𝚺)

𝒗~𝑁(𝒗|𝟎, 𝚪)

𝒖~𝑁 𝒖 𝟎,𝑷0

Parameters:

Transition model

𝒚1 𝒚2 𝒚3 𝒚𝑇

𝑝(𝒚1) = 𝑁(𝒚1|𝝁0, 𝑷0)

𝑝(𝒚𝑡 𝒚𝑡−1 = 𝑁(𝒚𝑡|𝑨𝒚𝑡−1, 𝚪)

𝑝(𝒙𝑡 𝒚𝑡 = 𝑁(𝒙𝑡|𝑾𝒚𝑡 , 𝚺) Emission:

Transition Probability :

First timestamp:

HMM LDS

Markov Chain

with discrete latent variables

Markov Chain

with continuous latent variables

𝑝 𝒚𝒕|𝒚𝒕−𝟏

𝝅 𝑝 𝒚𝟏

𝑝 𝒙𝒕|𝒚𝒕

𝐾 distributions 𝑝 𝒙𝒕|𝒚𝒕

𝐿𝑥𝐾

𝐾𝑥𝐾

𝐾𝑥1 𝑝(𝒚1) = 𝑁(𝒚1|𝝁0, 𝑷0)

𝑝(𝒚𝑡 𝒚𝑡−1 = 𝑁(𝑨𝒚𝑡−1|𝟎, 𝚪)

𝑝(𝒙𝑡 𝒚𝑡 = 𝑁(𝒙𝑡|𝑾𝒚𝑡 , 𝚺)

HMM vs LDS

What can we do with LDS?

Global Positioning System (GPS)

{𝑥𝑡 , 𝑦𝑡, 𝑧𝑡}

Noisy measurements

Filtered path

Noisy path

Actual path

We still have noisy measurements

Given a string of observations and parameters:

the observations that far.

This process is called Filtering: 𝑝 𝒚𝑡 𝒙1, 𝒙2, … , 𝒙𝑡

the whole time series.

This process is called Smoothing: 𝑝 𝒚𝑡 𝒙1, 𝒙2, … , 𝒙𝑇

All above probabilities are Gaussians. Means and covariance matrices

are computed recursively!!!!

(5) Prediction

𝑝 𝒚𝑡+𝛿 𝒙1, 𝒙2, … , 𝒙𝒕 𝑝 𝒙𝑡+𝛿 𝒙1, 𝒙2, … , 𝒙𝒕

(6) EM Parameter estimation (Baum-Welch algorithm)

𝜃 = {𝑾,𝑨, 𝝁0, 𝚺, 𝚪, 𝑷0}

(3) Find the probability of the model.

𝑝(𝒙1, 𝒙2, … , 𝒙𝑇) This process is called Evaluation:

Linear Dynamical Systems (Kalman filter) · Stefanos Zafeiriou Adv. Statistical Machine Learning...

Documents

Stefanos Antaris Distributed Publish/Subscribe Notification System for Online Social Networks Stefanos Antaris *, Sarunas Girdzijauskas George Pallis

Facial Behaviometrics: the Case of Facial Deformation in ... · Facial Behaviometrics: the Case of Facial Deformation in Spontaneous Smile/Laughter Stefanos Zafeiriou y yDept. of

Stefanos Fotiou , April 16th – 17th, 2014

Transaction Management Part I: Concurrency Control · Other Techniques (1) •Timeout & rollack (deadlock detection) – Assume 𝑇 wants a lock that 𝑇 holds 1. 𝑇 waits for

Stefanos Data Transfer

Hidden Markov Models (HMM) · Stefanos Zafeiriou Adv. Statistical Machine Learning (course 495) 1 2 3 𝑇 Let’s assume we have discrete random variables (e.g., taking 3 discrete

Stefanos Loukakos, Google Greece

Large Scale 3D Morphable Models - Home - Springer · david.dunaway@gosh.nhs.uk Stefanos Zafeiriou s.zafeiriou@imperial.ac.uk 1 Imperial College London, London, UK 2 Great Ormond Street

Stefanos Monastiridis - Portfolio

Stefanos Souldatos , Theodore Dalamagas , Timos Sellis

Visual Data Augmentation through Learning · 2018. 1. 23. · Visual Data Augmentation through Learning Grigorios G. Chrysos 1, Yannis Panagakis;2, Stefanos Zafeiriou 1 Department

A 3D Morphable Model Learnt From 10,000 Faces · A 3D Morphable Model learnt from 10,000 faces James Booth⋆ Anastasios Roussos⋆ Stefanos Zafeiriou⋆,∗ Allan Ponniah† David

Alexandros Stefanou Karathodoris Averof - greece.org · Psicharis Spyros Merkouris Stefanos Dragoumis. Professor Timoleon Argyropoulos Staboulis Stefanos Konstantinos Mousouros. ΑΝ∆ΡΕΑΣ

Presentación de PowerPoint...3 𝑥 1+0.7063𝜍𝑇=𝑇 0 −0.04832𝜍𝑇=𝑇 0 3 1+0.7063𝜍−0.04832𝜍3 𝜌𝑓=𝜌 ... Presentación de PowerPoint Author ordenador

Stefanos Antaris A Socio-Aware Decentralized Topology Construction Protocol Stefanos Antaris , Despina Stasi , Mikael Högqvist † George Pallis *, Marios

Incremental Face Alignment in the Wildopenaccess.thecvf.com/content_cvpr_2014/papers/...Incremental Face Alignment in the Wild Akshay Asthana 1 Stefanos Zafeiriou 1 Shiyang Cheng 1

Incremental Face Alignment in the Wild - cv-foundation.org...Incremental Face Alignment in the Wild Akshay Asthana 1 Stefanos Zafeiriou 1 Shiyang Cheng 1 Maja Pantic 1 ;2 1 Department

A 3D Morphable Model learnt from 10,000 faces · 2016-04-10 · A 3D Morphable Model learnt from 10,000 faces James Booth?Anastasios Roussos Stefanos Zafeiriou?; Allan Ponniahy David

MAGMA: Multi-level acceleratedgradientmirror descentalgorithm for large ... · 2016. 10. 5. · Vahan Hovhannisyan∗, Panos Parpas∗, and Stefanos Zafeiriou∗ Abstract. Composite

Stefanos Chatzinikolaou Greener Shipping Summitconference12.diorama.gr/images/presentations/Stefanos-Chatzi... · and Gap Planning Act Check. ... standards (e.g. TMSA) • Customers