21
Computer vision: models, learning and inference Chapter 3 Probability distributions Please send errata to [email protected]

03 cv mil_probability_distributions

  • Upload
    zukun

  • View
    223

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 03 cv mil_probability_distributions

Computer vision: models, learning and inference

Chapter 3

Probability distributions

Please send errata to [email protected]

Page 2: 03 cv mil_probability_distributions

2Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 3: 03 cv mil_probability_distributions

Why model these complicated quantities?

Because we need probability distributions over model parameters as well as over data and world state. Hence, some of the distributions describe the parameters of the others:

3Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 4: 03 cv mil_probability_distributions

Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Why model these complicated quantities?

Because we need probability distributions over model parameters as well as over data and world state. Hence, some of the distributions describe the parameters of the others:

Example:

Models mean

Models variance

Parameters modelled by:

4

Page 5: 03 cv mil_probability_distributions

Bernoulli Distribution

or

For short we write:

Bernoulli distribution describes situation where only two possible outcomes y=0/y=1 or failure/success

Takes a single parameter

5Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 6: 03 cv mil_probability_distributions

Beta DistributionDefined over data (i.e. parameter of Bernoulli)

• Two parameters both > 0 • Mean depends on relative values E[ ] = . • Concentration depends on magnitude

For short we write:

6Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 7: 03 cv mil_probability_distributions

Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Categorical Distribution

or can think of data as vector with all elements zero except kth e.g. [0,0,0,1 0]

For short we write:

Categorical distribution describes situation where K possible outcomes y=1… y=k.Takes a K parameters where

7

Page 8: 03 cv mil_probability_distributions

Dirichlet Distribution

Defined over K values where

Or for short: Has kparameters k>0

8Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 9: 03 cv mil_probability_distributions

Univariate Normal Distribution

For short we write:

Univariate normal distribution describes single continuous variable.

Takes 2 parameters and 2>09Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 10: 03 cv mil_probability_distributions

Normal Inverse Gamma Distribution

Defined on 2 variables and 2>0

or for short

Four parameters and

10Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 11: 03 cv mil_probability_distributions

Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Multivariate Normal Distribution

For short we write:

Multivariate normal distribution describes multiple continuous variables. Takes 2 parameters

• a vector containing mean position,• a symmetric “positive definite” covariance matrix

Positive definite: is positive for any real

11

Page 12: 03 cv mil_probability_distributions

Types of covarianceCovariance matrix has three forms, termed spherical, diagonal and full

12Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 13: 03 cv mil_probability_distributions

Normal Inverse Wishart

Defined on two variables: a mean vector and a symmetric positive definite matrix, .

or for short:

Has four parameters

• a positive scalar, • a positive definite matrix • a positive scalar, • a vector

13Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 14: 03 cv mil_probability_distributions

Samples from Normal Inverse Wishart

14Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 15: 03 cv mil_probability_distributions

Conjugate Distributions

The pairs of distributions discussed have a special relationship: they are conjugate distributions

• Beta is conjugate to Bernouilli• Dirichlet is conjugate to categorical• Normal inverse gamma is conjugate to univariate

normal• Normal inverse Wishart is conjugate to

multivariate normal

15Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 16: 03 cv mil_probability_distributions

Conjugate Distributions

When we take product of distribution and it’s conjugate, the result has the same form as the conjugate.

For example, consider the case where

then

a constant A new Beta distribution

16Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 17: 03 cv mil_probability_distributions

When we take product of distribution and it’s conjugate, the result has the same form as the conjugate.

Example proof

1717Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 18: 03 cv mil_probability_distributions

Bayes’ Rule Terminology

Posterior – what we know about y after seeing x

Prior – what we know about y before seeing x

Likelihood – propensity for observing a certain value of x given a certain value of y

Evidence – a constant to ensure that the left hand side is a valid distribution

18Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 19: 03 cv mil_probability_distributions

Importance of the Conjugate Relation 1

• Learning parameters: 1. Choose prior that is conjugate to likelihood

2. Implies that posterior must have same form as conjugate prior distribution

3. Posterior must be a distribution which implies that evidence must equal constant from conjugate relation

19Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 20: 03 cv mil_probability_distributions

Importance of the Conjugate Relation 2

• Marginalizing over parameters

1. Chosen so conjugate to other term

2. Integral becomes easy --the product becomes a constant times a distribution

Integral of constant times probability distribution= constant times integral of probability distribution= constant x 1 = constant

20Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 21: 03 cv mil_probability_distributions

Conclusions

21Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• Presented four distributions which model useful quantities

• Presented four other distributions which model the parameters of the first four

• They are paired in a special way – the second set is conjugate to the other

• In the following material we’ll see that this relationship is very useful