37
Russell and Norvig, AIMA : Chapter 15 Part B – 15.3, 15.4 Probabilistic Reasoning over Time 1 Presented to: Prof. Dr. S. M. Aqil Burney Presented by: Zain Abbas (MSCS-UBIT)

Russell and Norvig, AIMA : Chapter 15 Part B – 15.3, 15.4 1

Embed Size (px)

Citation preview

Page 1: Russell and Norvig, AIMA : Chapter 15 Part B – 15.3, 15.4 1

1

Russell and Norvig, AIMA : Chapter 15Part B – 15.3, 15.4

Probabilistic Reasoning over Time

Presented to: Prof. Dr. S. M. Aqil Burney

Presented by: Zain Abbas (MSCS-UBIT)

Page 2: Russell and Norvig, AIMA : Chapter 15 Part B – 15.3, 15.4 1

2

Agenda

Temporal probabilistic agents

Inference: Filtering, prediction,

smoothing and most likely

explanation

Hidden Markov models

Kalman filters

Dynamic Bayesian Networks

Page 3: Russell and Norvig, AIMA : Chapter 15 Part B – 15.3, 15.4 1

3

Stochastic (Random) Process

A process that grows in space or time in accordance with some probability distribution.

In the simplest possible case ("discrete time"), a stochastic process amounts to a sequence of random variables

Page 4: Russell and Norvig, AIMA : Chapter 15 Part B – 15.3, 15.4 1

4

Markov Chain

A stochastic process (family of random variables) {Xn, n = 0, 1, 2, . . .}, satisfying it takes on a finite or countable number of

possible values. If Xn = i, the process is said to be in state i at time n

whenever the process is in state i , there is a fixed probability Pij that it will next be in state j . Formally:

P{Xn+1= j |Xn = i ,Xn−1= in−1, . . . ,X1= i1,X0= i0 } = Pij

for all states i0, i1, . . . , in−1, in, i , j and all n≥0.

Page 5: Russell and Norvig, AIMA : Chapter 15 Part B – 15.3, 15.4 1

5

Hidden Markov Model

Set of states: Process moves from one state to another

generating a sequence of states : Markov chain property: probability of

each subsequent state depends only on what was the previous state:

States are not visible, but each state randomly generates one of M observations (or visible states)

},,,{ 21 Nsss

,,,, 21 ikii sss

)|(),,,|( 1121 ikikikiiik ssPssssP

},,,{ 21 Mvvv

Page 6: Russell and Norvig, AIMA : Chapter 15 Part B – 15.3, 15.4 1

6

Hidden Markov Model

To define hidden Markov model, the following probabilities have to be specified:

Matrix of transition probabilities A=(aij) where aij= P(si | sj)

Matrix of observation probabilities B=( bi (vm ) ) where bi(vm ) = P(vm | si)

A vector of initial probabilities =(i) where i = P(si) .

Page 7: Russell and Norvig, AIMA : Chapter 15 Part B – 15.3, 15.4 1

7

Hidden Markov Model Hidden Markov model unfolded in time

HMM ( Graphical View)

Page 8: Russell and Norvig, AIMA : Chapter 15 Part B – 15.3, 15.4 1

8

Summary of the Concept

Q

Q

QXPQP

QXPXP

)|()(

),()(

Q

TTT qqqxxxPqqqP )|()( 212121

Q

T

ttt

T

ttt qxpqqP

111 )|()|(

Markov chain process Output process

Page 9: Russell and Norvig, AIMA : Chapter 15 Part B – 15.3, 15.4 1

9

Earlier Example

Transition Matrix Tij =

Sensor Matrix with U1= true, O1=

7.03.0

3.07.0

2.00

09.0

Page 10: Russell and Norvig, AIMA : Chapter 15 Part B – 15.3, 15.4 1

10

Messages as column vectors Forward and backward messages as column

vectors:

tT

tt fTOf :111:1

tx

tttttttt exPxXPXePeXP )|()|()|()|( :11111:11

)|()|()|( :1:1:1 kkktktk eXPXePeXP

tkktk bOTb :21:1

Page 11: Russell and Norvig, AIMA : Chapter 15 Part B – 15.3, 15.4 1

11

Messages as column vectors

Can avoid storing all forward messages in smoothing by running forward algorithm backwards:

tttT

tT

tt

tT

tt

ffOT

fTfO

fTOf

:11:111

1

:11:111

:111:1

Page 12: Russell and Norvig, AIMA : Chapter 15 Part B – 15.3, 15.4 1

12

Example

Low High

0.70.3

0.2 0.8

DryRain

0.6 0.60.4 0.4

Page 13: Russell and Norvig, AIMA : Chapter 15 Part B – 15.3, 15.4 1

13

Example

Two states: ‘Low’ and ‘High’ atmospheric pressure

Two observations: ‘Rain’ and ‘Dry’ Transition probabilities:

P(‘Low’|‘Low’)=0.3 , P(‘High’|‘Low’)=0.7 , P(‘Low’|‘High’)=0.2, P(‘High’|‘High’)=0.8

Observation probabilities: P(‘Rain’|‘Low’)=0.6 , P(‘Dry’|‘Low’)=0.4 , P(‘Rain’|‘High’)=0.4 , P(‘Dry’|‘High’)=0.3

Initial probabilities: say P(‘Low’)=0.4 , P(‘High’)=0.6 .

Page 14: Russell and Norvig, AIMA : Chapter 15 Part B – 15.3, 15.4 1

14

Calculation of probabilities Suppose we want to calculate a probability of

a sequence of observations in our example, {‘Dry’, ’Rain’}

Consider all possible hidden state sequences

P({‘Dry’,’Rain’} ) = P({‘Dry’,’Rain’} , {‘Low’,’Low’}) + P({‘Dry’,’Rain’} , {‘Low’,’High’}) + P({‘Dry’,’Rain’} , {‘High’,’Low’}) + P({‘Dry’,’Rain’} , {‘High’,’High’})

Page 15: Russell and Norvig, AIMA : Chapter 15 Part B – 15.3, 15.4 1

15

Calculation of probabilities The first term can be calculated as : P({‘Dry’,’Rain’} , {‘Low’,’Low’})

=P({‘Dry’,’Rain’} | {‘Low’,’Low’}) * P({‘Low’,’Low’})

=P(‘Dry’|’Low’)*P(‘Rain’|’Low’) * P(‘Low’)*P(‘Low’|’Low)

= 0.4*0.4*0.6*0.4*0.3 = 0.088

Page 16: Russell and Norvig, AIMA : Chapter 15 Part B – 15.3, 15.4 1

16

Agenda

Temporal probabilistic agents

Inference: Filtering, prediction,

smoothing and most likely

explanation

Hidden Markov models

Kalman filters

Dynamic Bayesian Networks

Page 17: Russell and Norvig, AIMA : Chapter 15 Part B – 15.3, 15.4 1

17

Kalman Filters

System state cannot be measured directly Need to estimate “optimally” from

measurements

Measuring Devices

Kalman Filter

MeasurementError Sources

System State (desired but not known)

External Controls

Observed Measurements

Optimal Estimate of

System State

SystemError Sources

System

Black Box

Page 18: Russell and Norvig, AIMA : Chapter 15 Part B – 15.3, 15.4 1

18

What is a Kalman Filter?

A set of mathematical equations

Iterative, recursive process

Optimal data processing algorithm under

certain criteria

For linear system and white Gaussian errors,

Kalman filter is “best” estimate based on all

previous measurements

Estimates past, present, future states

Page 19: Russell and Norvig, AIMA : Chapter 15 Part B – 15.3, 15.4 1

19

White Gaussian Noise

White noise is a random signal (or process) with a flat power spectral density. In other words, the signal contains equal power within a fixed bandwidth at any center frequency.

Page 20: Russell and Norvig, AIMA : Chapter 15 Part B – 15.3, 15.4 1

20

Optimal

Dependent upon the criteria chosen to evaluate performance

Under certain assumptions, KF is optimal with respect to virtually any criteria that makes sense Linear data Gaussian model

Page 21: Russell and Norvig, AIMA : Chapter 15 Part B – 15.3, 15.4 1

21

Recursive

A Kalman filter only needs info from the previous state Updated for each iteration Older data can be discarded

▪ Saves computation capacity and storage

Page 22: Russell and Norvig, AIMA : Chapter 15 Part B – 15.3, 15.4 1

22

Variables

In order to use the Kalman filter to estimate the internal state of a process given only a sequence of noisy observations, one must model the process in accordance with the framework of the Kalman filter.

This means specifying the matrices Fk , Hk

, Qk , Rk and sometimes Bk for each time-step k .

Page 23: Russell and Norvig, AIMA : Chapter 15 Part B – 15.3, 15.4 1

23

Variables

xk = state vector, process to examine wk = process noise

White, Gaussian, Mean=0, Covariance Matrix Q vk = measurement noise

White, Gaussian, Mean=0, Covariance Matrix R Uncorrelated with wk

Sk = Covariance of the innovation, residual Kk = Kalman gain matrix Pk = Covariance of prediction error zk= Measurement of system state

Page 24: Russell and Norvig, AIMA : Chapter 15 Part B – 15.3, 15.4 1

24

Equations

kkk

kk

k

k

k

kkk

vposz

at

t

vel

pos

vel

pos

wAxX

210

01 2

1

1

1

Page 25: Russell and Norvig, AIMA : Chapter 15 Part B – 15.3, 15.4 1

25

More Equations

kkkkk

Tkkk

Tkk

kkk

kk

xAzKxAx

APSAPQAAPP

SAPK

RPS

ˆˆˆ 11

11

1

Page 26: Russell and Norvig, AIMA : Chapter 15 Part B – 15.3, 15.4 1

26

Kalman gain

Relates the new estimate to the most certain of the previous estimates Large measurement noise -> Small gain Large system noise -> Large gain

System and measurement noise unchanged Steady-state Kalman Filter

Page 27: Russell and Norvig, AIMA : Chapter 15 Part B – 15.3, 15.4 1

27

Kalman Filter

The Kalman filter has two distinct phases: Predict Update

The predict phase uses the state estimate from the previous timestep to produce an estimate of the state at the current timestep.

In the update phase, measurement information at the current timestep is used to refine this prediction to arrive at a new, (hopefully) more accurate state estimate, again for the current timestep.

Page 28: Russell and Norvig, AIMA : Chapter 15 Part B – 15.3, 15.4 1

28

Iterative calculations

Prediction The state The error covariance

Update Kalman gain Update with new

measurement Update with new error

covariance

Update

Predict

Page 29: Russell and Norvig, AIMA : Chapter 15 Part B – 15.3, 15.4 1

29

Iterative calculations

Prediction

Update

Update

Predict

Tkkk

Tkk

kkk

APSAPQAAPP

wAxX1

1

11

kkkk

kkkkk

kkk

PSKIP

xAzKxAx

SAPK

1

11

1

ˆˆˆ

Page 30: Russell and Norvig, AIMA : Chapter 15 Part B – 15.3, 15.4 1

30

Lost on the 1-dimensional line Position – y(t) Assume Gaussian distributed

measurements

y

Example

Page 31: Russell and Norvig, AIMA : Chapter 15 Part B – 15.3, 15.4 1

Example

0 10 20 30 40 50 60 70 80 90 1000

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

• Sextant Measurement at t1: Mean = z1 and Variance = z1

• Optimal estimate of position is: ŷ(t1) = z1

• Variance of error in estimate: 2x (t1) = 2

z1

• Boat in same position at time t2 - Predicted position is z1

31

Page 32: Russell and Norvig, AIMA : Chapter 15 Part B – 15.3, 15.4 1

0 10 20 30 40 50 60 70 80 90 1000

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

Example

• So we have the prediction ŷ-(t2)• GPS Measurement at t2: Mean = z2 and Variance = z2

• Need to correct the prediction due to measurement to get ŷ(t2)• Closer to more trusted measurement – linear interpolation?

prediction ŷ-(t2)measurement z(t2)

32

Page 33: Russell and Norvig, AIMA : Chapter 15 Part B – 15.3, 15.4 1

33

0 10 20 30 40 50 60 70 80 90 1000

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

• Corrected mean is the new optimal estimate of position• New variance is smaller than either of the previous two

variances

measurement z(t2)

corrected optimal estimate ŷ(t2)

prediction ŷ-(t2)

Example

Page 34: Russell and Norvig, AIMA : Chapter 15 Part B – 15.3, 15.4 1

34

Example – Accelerating Spacecraft

• Assume that the system variables, represented by the

vector x, are governed by the equation

xk+1 = Axk + wk

where wk is random process noise, and the subscripts on the

vectors represent the time step.

• A spacecraft is accelerating with random bursts of gas from

its reaction control system thrusters

• The vector x might consist of position p and velocity v.

Page 35: Russell and Norvig, AIMA : Chapter 15 Part B – 15.3, 15.4 1

35

Example – Accelerating Spacecraft

The system equation would be given by

where ak is the random time-varying acceleration, and T is the time between step k and step k+1.

k

T

k

k

k

k aTv

pT

v

p

2

1

12

10

1

Page 36: Russell and Norvig, AIMA : Chapter 15 Part B – 15.3, 15.4 1

36

Example – Accelerating Spacecraft

The system represented was simulated on a computer with random bursts of acceleration which had a standard deviation of 0.5 feet/sec2.

The position was measured with an error of 10 feet (one standard deviation).

Software used: MATLAB®

Page 37: Russell and Norvig, AIMA : Chapter 15 Part B – 15.3, 15.4 1

37

Example – Accelerating Spacecraft