50
Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions

Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions

Embed Size (px)

Citation preview

Page 1: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions

Environmental Data Analysis with MatLab

Lecture 4:Multivariate Distributions

Page 2: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions

Lecture 01 Using MatLabLecture 02 Looking At DataLecture 03 Probability and Measurement Error Lecture 04 Multivariate DistributionsLecture 05 Linear ModelsLecture 06 The Principle of Least SquaresLecture 07 Prior InformationLecture 08 Solving Generalized Least Squares Problems Lecture 09 Fourier SeriesLecture 10 Complex Fourier SeriesLecture 11 Lessons Learned from the Fourier TransformLecture 12 Power SpectraLecture 13 Filter Theory Lecture 14 Applications of Filters Lecture 15 Factor Analysis Lecture 16 Orthogonal functions Lecture 17 Covariance and AutocorrelationLecture 18 Cross-correlationLecture 19 Smoothing, Correlation and SpectraLecture 20 Coherence; Tapering and Spectral Analysis Lecture 21 InterpolationLecture 22 Hypothesis testing Lecture 23 Hypothesis Testing continued; F-TestsLecture 24 Confidence Limits of Spectra, Bootstraps

SYLLABUS

Page 3: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions

purpose of the lecture

understanding propagation of error

from many data

to several inferences

Page 4: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions

probability

with several variables

Page 5: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions

example

100 birds live on an island

30 tan pigeons

20 white pigeons

10 tan gulls

40 white gulls

treat the species and color of the birds as random variables

Page 6: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions

tan, t white, w

pigeon, p 30% 20%

gull, g 10% 40%

color, c

spec

ies,

sJoint Probability, P(s,c)

probability that a bird has a species, s, and a color, c.

Page 7: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions

probabilities must add up to 100%

Page 8: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions

tan, t white, w

pigeon, p 30% 20%

gull, g 10% 40%

color, c

spec

ies,

s pigeon, p 50%

gull, g 50%

P(s,c) P(s)

sum rows

tan, t white, w

40% 60%

sum columns

P(c)

spec

ies,

s

color, c

Univariate probabilities can be calculated by summing the rows and columns of P(s,c)

probability of species, irrespective of color

probability of color,

irrespective of species

Page 9: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions

probability of species, irrespective of color

probability of color, irrespective of species

Page 10: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions

tan, t white, w

pigeon, p 30% 20%

gull, g 10% 40%

color, c

spec

ies,

s

P(s,c)divide

by row sums

divide by column sums

tan, t white, w

pigeon, p 60% 40%

gull, g 20% 80%

color, c

spec

ies,

s

P(c|s)

tan, t white, w

pigeon, p 75% 33%

gull, g 25% 67%

color, c

spec

ies,

s

P(s|c)

Conditional probabilities: probability of one thing, given that you know another

probability of color, given

species

probability of species, given color

probability of color and species

Page 11: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions

calculation of conditional probabilities

divide each species by

fraction of birds of that color

divide each color by

fraction of birds of that species

Page 12: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions

Bayes Theorem

same, so solving for P(s,c)

rearrange

Page 13: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions

3 ways to write P(c) and P(s)

Page 14: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions

so 3 ways to write Bayes Therem

the last way seems the most complicated, but it is also the most useful

Page 15: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions

Beware!

major cause of error bothamong scientists and the general public

Page 16: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions

example

probability that a dead person succumbed to pancreatic cancer(as contrasted to some other cause of death)

P(cancer|death) = 1.4%

probability that a person diagnosed with pancreatic cancerwill die of it in the next five years

P(death|cancer) = 90%

vastly different numbers

Page 17: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions

Bayesian Inference“updating information”

An observer on the island has sighted a bird.

We want to know whether it’s a pigeon.

when the observer says, “bird sighted”, the probability that it’s a pigeon is:P(s=p) = 50%

since pigeons comprise half of the birds on the island.

Page 18: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions

Now the observer says, “the bird is tan”.

The probability that it’s a pigeon changes.

We now want to knowP(s=p|c=t) The conditional probability that it’s a pigeon, given

that we have observed its color to be tan.

Page 19: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions

we use the formula

% of tan pigeons

% of tan birds

Page 20: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions

observation of the bird’s color changed the probability that is was a pigeon

from 50%to 75%

thus Bayes Theorem offers a way to assess the value of an observation

Page 21: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions

continuous variables

joint probability density function, p(d1, d2)

Page 22: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions

d1

d2p(d1,d2) d2

L d2L

d1L

d1R

if the probability

density function , p(d1,d2),

is though of as a cloud made of

water vapor

the probability

that (d1,d2) is in the box given by the total mass of water vapor in the box

Page 23: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions

normalized to unit total probability

Page 24: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions

univariate p.d.f.’s“integrate away” one of the variables

the p.d.f. of d1 irrespective of d2 the p.d.f. of d2 irrespective of d1

Page 25: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions

d1

d2

d1

integrate over d2

integrate over d1

d2

p(d1,d2)

p(d2)

p(d1)

Page 26: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions

mean and variance calculated in usual way

Page 27: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions

correlation

tendency of random variable d1to be large/small

when random variable d2 is large/small

Page 28: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions

positive correlation:

tall people tend to weigh more than short people

negative correlation:

long-time smokers tend to die young

Page 29: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions

d1

d2

positive correlation

d1

d2

d1

d2

negative correlation

uncorrelated

shape of p.d.f.

Page 30: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions

d1

d2p(d1,d2)

d1

d2

d1

d2s(d1,d2) s(d1,d2) p(d1,d2)

+

+

-

-

quantifying correlation

now multiply and

integrate

p.d.f. 4-quadrant function

Page 31: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions

covariancequantifies correlation

Page 32: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions

combine variance and covariance into a matrix, C

C = σ12σ22σ1,2

σ1,2 note that C is symmetric

Page 33: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions

many random variablesd1, d2, d3 … dN

write d’s a a vectord = [d1, d2, d3 … dN ]T

Page 34: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions

the mean is then a vector, too

d = [d1, d2, d3 … dN ]T

Page 35: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions

and the covariance is an N×N matrix, C

C = σ12 σ22σ1,2σ32 …σ1,3σ2,3σ1,2 σ2,3σ1,3

………… … …variance on the main diagonal

Page 36: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions

multivariate Normal p.d.f.

square root of

determinant of covariance

matrix

inverse of covariance

matrix

data minus its mean

Page 37: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions

compare with univariate Normal p.d.f.

Page 38: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions

corresponding terms

Page 39: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions

error propagationp(d) is Normal with mean d and covariance, Cd.

given model parameters mwhere m is a linear function of dm = MdQ1. What is p(m)?

Q2. What is its mean m and covariance Cm?

Page 40: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions

AnswerQ1: What is p(m)?

A1: p(m) is Normal

Q2: What is its mean m and covariance Cm?

A2: m = Md and Cm = M Cd MT

Page 41: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions

where the answer comes fromtransform p(d) to p(m)starting with a Normal p.d.f. for p(d) :and the multivariate transformation rule:

determinant

Page 42: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions

this is not as hard as it looks

because the Jacobian determinant J(m) is constant:

so, starting with p(d), replaces every occurrence of d with M-1m and multiply the result by |M-1|. This yields:

Page 43: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions

p(m) where

rule for error propagation

Normal p.d.f. for model parameters

Page 44: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions

example

d1measurement 1:

weight of A

A A

B

d2measurement 2:

combined weight of A and B

suppose the measurements are uncorrelated and that both have the same variance, σd

2

Page 45: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions

model parameters

m1weight of B

A

B

m1 = d2 – d1

d1

B=

m2weight of B

minusweight of A A

B A

A

A-

-

+

A

B= + - +

m2 = d2 – 2d1

Page 46: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions

linear rule relating model parameters to data

m = M dwith

Page 47: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions

so the means of the model parameters are

Page 48: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions

and the covariance matrix is

Page 49: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions

model parameters are correlated, even though data are uncorrelated

bad

variance of model parameters different than variance of data

bad if bigger, good if smaller

Page 50: Environmental Data Analysis with MatLab Lecture 4: Multivariate Distributions

d1

p(d1,d2)

40

0400 d2

m1

p(m1,m2)

20

-2020-20 m2

The model parameters, (m1, m2), have mean (10, -5), variance (10, 25) and covariance,15.

The data, (d1, d2), have mean (15, 25), variance (5, 5) and zero covariance

example with specific values of d and σd2