59
Linear algebra, multivariate distributions, and all that jazz Rebecca C. Steorts Predictive Modeling: STA 521 September 8 2015 1

Linear algebra, multivariate distributions, and all that jazzrcs46/lectures_2015/02-multivar/02-multivar.pdf · Linear algebra, multivariate distributions, and all that jazz Rebecca

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Linear algebra, multivariate distributions, and all that jazzrcs46/lectures_2015/02-multivar/02-multivar.pdf · Linear algebra, multivariate distributions, and all that jazz Rebecca

Linear algebra, multivariate distributions,and all that jazz

Rebecca C. SteortsPredictive Modeling: STA 521

September 8 2015

1

Page 2: Linear algebra, multivariate distributions, and all that jazzrcs46/lectures_2015/02-multivar/02-multivar.pdf · Linear algebra, multivariate distributions, and all that jazz Rebecca

Linear algebra, multivariate distributions,and all that jazz

Rebecca C. SteortsPredictive Modeling: STA 521

September 8 2015

2

Page 3: Linear algebra, multivariate distributions, and all that jazzrcs46/lectures_2015/02-multivar/02-multivar.pdf · Linear algebra, multivariate distributions, and all that jazz Rebecca

I Random vectors

I Independence

I Expectations and Covariances

I Quadratic Forms

I Multivariate Normal Distribution

I Using R

3

Page 4: Linear algebra, multivariate distributions, and all that jazzrcs46/lectures_2015/02-multivar/02-multivar.pdf · Linear algebra, multivariate distributions, and all that jazz Rebecca

We review matrix algebra:

I No late labs, homeworks.

I Will drop lowest lab/homework grade.

I New homework is coming.

I What if I miss class or lab?

I If there is a grade question, send an email to TA’s and myself,outlining question and why you desire points back.

4

Page 5: Linear algebra, multivariate distributions, and all that jazzrcs46/lectures_2015/02-multivar/02-multivar.pdf · Linear algebra, multivariate distributions, and all that jazz Rebecca

Random Vectors

DefinitionWe define Xp to be a p-variate random vector,

X =

X1

X2...Xp

,

where its entries X1, . . . , Xp are random variables .

RemarkA random variable can be considered a univariate random vector.

5

Page 6: Linear algebra, multivariate distributions, and all that jazzrcs46/lectures_2015/02-multivar/02-multivar.pdf · Linear algebra, multivariate distributions, and all that jazz Rebecca

Independence

If X1, . . . , Xm are continuous, independent implies that thedensity factorizes: fX1,...,Xm

(x1, . . . , xm) =∏mi=1 fi(xi).

I Non-random vectors are constant or deterministic.

I They are also considered random vectors, however, withprobability 1 of being equal to a constant (there’s nothingrandom going on here).

I They are trivially independent of all other random vectors.

6

Page 7: Linear algebra, multivariate distributions, and all that jazzrcs46/lectures_2015/02-multivar/02-multivar.pdf · Linear algebra, multivariate distributions, and all that jazz Rebecca

Expected Value

I Expected value of a random vector (or random matrix) isdefined to be the vector of expected values of its univariatecomponents.

I Properties of expectation carry over from the univariate case.

7

Page 8: Linear algebra, multivariate distributions, and all that jazzrcs46/lectures_2015/02-multivar/02-multivar.pdf · Linear algebra, multivariate distributions, and all that jazz Rebecca

Expected Value

DefinitionLet X be a p-variate random vector. Then the expected value ofX is

µX = E(X) = E[(X1 · · · Xp)T ],

and if X is continuous then E(X) =∫xf(x) dx.

8

Page 9: Linear algebra, multivariate distributions, and all that jazzrcs46/lectures_2015/02-multivar/02-multivar.pdf · Linear algebra, multivariate distributions, and all that jazz Rebecca

MSE

The mean is the best constant predictor of X in terms of the meansquared error (MSE):

E(X) = arg minc∈Rp

E‖X − c‖2.

Proof:

∂x

[(X − c)T (X − c)

]=

∂x

[XTX − 2cTX − cT c

](1)

= 2(X − c). (2)

Then E[X − c] = 0 =⇒ E[X] = c.Since the second derivative is positive (2), the solution is unique.

9

Page 10: Linear algebra, multivariate distributions, and all that jazzrcs46/lectures_2015/02-multivar/02-multivar.pdf · Linear algebra, multivariate distributions, and all that jazz Rebecca

Let A be a m× p matrix and Y be an m-variate random vector.Then

E(AX + Y ) = AE(X) + E(Y ).

Let b be a constant vector. Then

E(bTX) = bTE(X).

10

Page 11: Linear algebra, multivariate distributions, and all that jazzrcs46/lectures_2015/02-multivar/02-multivar.pdf · Linear algebra, multivariate distributions, and all that jazz Rebecca

Covariance Matrix

Let X be a p-variate random vector. The covariance matrix of X is

ΣXX = Var(X) = E{(X − µ)T (X − µ)}

=

Var(X1) Cov(X1, X2) . . . Cov(X1, Xp)

Cov(X2, X1) Var(X2) . . . Cov(X2, Xp)...

.... . .

...Cov(Xp, X1) Cov(Xp, X2) . . . Var(Xp)

.

11

Page 12: Linear algebra, multivariate distributions, and all that jazzrcs46/lectures_2015/02-multivar/02-multivar.pdf · Linear algebra, multivariate distributions, and all that jazz Rebecca

Let A be an m× p matrix. Let Y be a random vector. Then

I Var(AX) = AVar(X)AT .

I Var(X + Y ) = Var(X) + Var(Y ) + 2 Cov(X,Y ).

I If X and Y are independent then Cov(X,Y ) = 0.

12

Page 13: Linear algebra, multivariate distributions, and all that jazzrcs46/lectures_2015/02-multivar/02-multivar.pdf · Linear algebra, multivariate distributions, and all that jazz Rebecca

Cross-covariance matrix

We define the covariance matrix (cross covariance) between X andY to be

ΣXY = Cov(X,Y ) = E{(X − µX)T (Y − µY )}

=

Cov(X1, Y1) Cov(X1, Y2) . . . Cov(X1, Ym)Cov(X2, Y1) Cov(X2, Y2) . . . Cov(X2, Xm)

......

. . ....

Cov(Xp, Y1) Cov(Xp, Y2) . . . Cov(Xp, Ym)

.

13

Page 14: Linear algebra, multivariate distributions, and all that jazzrcs46/lectures_2015/02-multivar/02-multivar.pdf · Linear algebra, multivariate distributions, and all that jazz Rebecca

Other properties: Let A and B be constant matrices, and let a andb be constant vectors.

I Cov(AX,BY ) = ACov(X,Y )BT .

I Cov(X + a, Y + b) = Cov(X,Y ).

14

Page 15: Linear algebra, multivariate distributions, and all that jazzrcs46/lectures_2015/02-multivar/02-multivar.pdf · Linear algebra, multivariate distributions, and all that jazz Rebecca

Neuroimaging example

I X is the intensity of light at every pixel of light in an image.

I Y is the magnitude of the fMRI at every voxel in the brain.

I −a is the average intensity for every image shown inexperiment.

I −b is the average fMRI of the experiment.

Cov(X − a, Y − b) = Cov(X,Y ).

We can center and shift observations, and the covariance isunchanged!

15

Page 16: Linear algebra, multivariate distributions, and all that jazzrcs46/lectures_2015/02-multivar/02-multivar.pdf · Linear algebra, multivariate distributions, and all that jazz Rebecca

Let X and Y be p- and m-variate vectors. Then

V ar

[(XY

)]=

(Var(X) Cov(X,Y )

Cov(Y,X) Var(Y )

).

RemarkThe off-diagonal blocks of the covariance matrix are crosscovariances.

16

Page 17: Linear algebra, multivariate distributions, and all that jazzrcs46/lectures_2015/02-multivar/02-multivar.pdf · Linear algebra, multivariate distributions, and all that jazz Rebecca

Trace

Let A = (aij) be a square matrices of dimension d× d.1. The trace of A is the sum of its diagonal elements:

tr(A) =∑i

aii.

17

Page 18: Linear algebra, multivariate distributions, and all that jazzrcs46/lectures_2015/02-multivar/02-multivar.pdf · Linear algebra, multivariate distributions, and all that jazz Rebecca

Recall that the mean is the best constant predictor of X in termsof the MSE:

E(X) = arg minc∈Rp

E‖X − c‖2.

The total variance of X is the MSE of the mean:

E||X − E(X)||2 = tr(Var(X)).

The total variance of X measures the overall variability of thecomponents of X around the mean E(X).

18

Page 19: Linear algebra, multivariate distributions, and all that jazzrcs46/lectures_2015/02-multivar/02-multivar.pdf · Linear algebra, multivariate distributions, and all that jazz Rebecca

Let a be a constant p-vector. Then aTX =∑p

i=1 aiXi andVar(aTX) = aT Var(X)a.

I This helps us measure the variation along some direction a.

I You can make a connection with this in other classes andPCA.

19

Page 20: Linear algebra, multivariate distributions, and all that jazzrcs46/lectures_2015/02-multivar/02-multivar.pdf · Linear algebra, multivariate distributions, and all that jazz Rebecca

Quadratic Forms

Let A be a symmetric matrix and x a vector.

DefinitionA quadratic form is written as

xTAx =∑i

∑j

aijxixj .

Note: it’s a quadratic function of x.

20

Page 21: Linear algebra, multivariate distributions, and all that jazzrcs46/lectures_2015/02-multivar/02-multivar.pdf · Linear algebra, multivariate distributions, and all that jazz Rebecca

I As a function of a, Var(aTX) = aT Var(X)a = aTΣXa,which is a quadratic form in a.

I Quadratic forms are very common in multivariate analysis.

I Example: Chi-squared test statistic is a quadratic form.

21

Page 22: Linear algebra, multivariate distributions, and all that jazzrcs46/lectures_2015/02-multivar/02-multivar.pdf · Linear algebra, multivariate distributions, and all that jazz Rebecca

Suppose Z1 . . . Zpiid∼ N(0, 1).

Then

||Z||2 =∑i

Z2i ∼ χ2

p.

NowZi

ind∼ N(µi, 1), ∀i.

Then Y p×1 ∼ Np(µ, I) =⇒ Y TY ∼ χ2p(

12µ

Tµ).

This is called a non-central Chi-squared distribution.

22

Page 23: Linear algebra, multivariate distributions, and all that jazzrcs46/lectures_2015/02-multivar/02-multivar.pdf · Linear algebra, multivariate distributions, and all that jazz Rebecca

Suppose Z1 . . . Zpiid∼ N(0, 1). Then

||Z||2 =∑i

Z2i ∼ χ2

p.

NowZi

ind∼ N(µi, 1), ∀i.

Then Y p×1 ∼ Np(µ, I) =⇒ Y TY ∼ χ2p(

12µ

Tµ).

This is called a non-central Chi-squared distribution.

22

Page 24: Linear algebra, multivariate distributions, and all that jazzrcs46/lectures_2015/02-multivar/02-multivar.pdf · Linear algebra, multivariate distributions, and all that jazz Rebecca

Suppose Z1 . . . Zpiid∼ N(0, 1). Then

||Z||2 =∑i

Z2i ∼ χ2

p.

NowZi

ind∼ N(µi, 1), ∀i.

Then Y p×1 ∼ Np(µ, I) =⇒ Y TY ∼ χ2p(

12µ

Tµ).

This is called a non-central Chi-squared distribution.

22

Page 25: Linear algebra, multivariate distributions, and all that jazzrcs46/lectures_2015/02-multivar/02-multivar.pdf · Linear algebra, multivariate distributions, and all that jazz Rebecca

Suppose Z1 . . . Zpiid∼ N(0, 1). Then

||Z||2 =∑i

Z2i ∼ χ2

p.

NowZi

ind∼ N(µi, 1), ∀i.

Then Y p×1 ∼ Np(µ, I) =⇒

Y TY ∼ χ2p(

12µ

Tµ).

This is called a non-central Chi-squared distribution.

22

Page 26: Linear algebra, multivariate distributions, and all that jazzrcs46/lectures_2015/02-multivar/02-multivar.pdf · Linear algebra, multivariate distributions, and all that jazz Rebecca

Suppose Z1 . . . Zpiid∼ N(0, 1). Then

||Z||2 =∑i

Z2i ∼ χ2

p.

NowZi

ind∼ N(µi, 1), ∀i.

Then Y p×1 ∼ Np(µ, I) =⇒ Y TY ∼ χ2p(

12µ

Tµ).

This is called a non-central Chi-squared distribution.

22

Page 27: Linear algebra, multivariate distributions, and all that jazzrcs46/lectures_2015/02-multivar/02-multivar.pdf · Linear algebra, multivariate distributions, and all that jazz Rebecca

Define Y = µ+ Z. Then Z ∼ N(0, I).

Then

Y TY = (Z + µ)T (Z + µ)

= ZTZ + 2µTZ + µTµ =: χ2p(

1

2µTµ).

Hence, the non-central χ2p is a quadratic form.

23

Page 28: Linear algebra, multivariate distributions, and all that jazzrcs46/lectures_2015/02-multivar/02-multivar.pdf · Linear algebra, multivariate distributions, and all that jazz Rebecca

Define Y = µ+ Z. Then Z ∼ N(0, I). Then

Y TY = (Z + µ)T (Z + µ)

= ZTZ + 2µTZ + µTµ =: χ2p(

1

2µTµ).

Hence, the non-central χ2p is a quadratic form.

23

Page 29: Linear algebra, multivariate distributions, and all that jazzrcs46/lectures_2015/02-multivar/02-multivar.pdf · Linear algebra, multivariate distributions, and all that jazz Rebecca

Positive Semi-Definite and Positive Definite

1. A square matrix A is called positive semi-definite if A issymmetric and xTAx ≥ 0 for all x 6= 0.

2. The matrix A is called positive definite if xTAx > 0 for allx 6= 0.

24

Page 30: Linear algebra, multivariate distributions, and all that jazzrcs46/lectures_2015/02-multivar/02-multivar.pdf · Linear algebra, multivariate distributions, and all that jazz Rebecca

Eigenvalue and eigenvector

Let v > 0 and let A be d× d.1. v is an eigenvector with eigenvalue λ when

Av = λv.

25

Page 31: Linear algebra, multivariate distributions, and all that jazzrcs46/lectures_2015/02-multivar/02-multivar.pdf · Linear algebra, multivariate distributions, and all that jazz Rebecca

More about eigenvalues and eigenvectors

Let v > 0 and let A be d× d.1. It’s typical to normalize the eigenvector to have length 1 (or

have it’s entries sum to 1).

2. A has at most p distinct eigenvalues (think about why).

3. Eigenvectors with distinct eigenvalues are orthogonal. (we willsoon define orthogonal).

26

Page 32: Linear algebra, multivariate distributions, and all that jazzrcs46/lectures_2015/02-multivar/02-multivar.pdf · Linear algebra, multivariate distributions, and all that jazz Rebecca

More about eigenvalues and eigenvectors

Let v > 0 and let A be d× d.1. It’s typical to normalize the eigenvector to have length 1 (or

have it’s entries sum to 1).

2. A has at most p distinct eigenvalues (think about why).

3. Eigenvectors with distinct eigenvalues are orthogonal. (we willsoon define orthogonal).

26

Page 33: Linear algebra, multivariate distributions, and all that jazzrcs46/lectures_2015/02-multivar/02-multivar.pdf · Linear algebra, multivariate distributions, and all that jazz Rebecca

More about eigenvalues and eigenvectors

Let v > 0 and let A be d× d.1. It’s typical to normalize the eigenvector to have length 1 (or

have it’s entries sum to 1).

2. A has at most p distinct eigenvalues (think about why).

3. Eigenvectors with distinct eigenvalues are orthogonal. (we willsoon define orthogonal).

26

Page 34: Linear algebra, multivariate distributions, and all that jazzrcs46/lectures_2015/02-multivar/02-multivar.pdf · Linear algebra, multivariate distributions, and all that jazz Rebecca

If A is positive definite, then:

I All of its eigenvalues are real-valued and positive.

I Its inverse is also positive definite.

Note that covariance matrices have the following properties:

I Every covariance matrix is a positive semi-definite matrix.

I Every positive semi-definite matrix is a covariance matrix.

The following result from linear algebra is also highly useful.

27

Page 35: Linear algebra, multivariate distributions, and all that jazzrcs46/lectures_2015/02-multivar/02-multivar.pdf · Linear algebra, multivariate distributions, and all that jazz Rebecca

Spectral Decomposition Theorem

Theorem (Spectral Decomposition Theorem)

Let Ap×p be symmetric with orthonormal eigenvectors v1, . . . , vpand corresponding eigenvalues λ1, . . . , λp. Then A = PΛP T ,where P = (v1 · · · vp) and Λ = Diag(λ1, . . . , λp).

The spectral decomposition theorem allows some operations withpositive definite matrices to be computed more easily:

I A−1 = PΛ−1P T .

I A1/2 = PΛ1/2P T .

28

Page 36: Linear algebra, multivariate distributions, and all that jazzrcs46/lectures_2015/02-multivar/02-multivar.pdf · Linear algebra, multivariate distributions, and all that jazz Rebecca

Alternative Spectral Decomposition

P is orthogonal if P TP = 1 and PP T = 1.

Theorem (Alternative Spectral Decomposition)

Let A be symmetric n× n. Then we can write

A = PDP T ,

where D = diag(λ1, . . . , λn) and P is orthogonal. The λs are theeigenvalues of A and ith column of P is an eigenvectorcorresponding to λi.

29

Page 37: Linear algebra, multivariate distributions, and all that jazzrcs46/lectures_2015/02-multivar/02-multivar.pdf · Linear algebra, multivariate distributions, and all that jazz Rebecca

TheoremIf Y ∼ Nn(µ, I), and P is an orthogonal projection with r(P ) = k,then

Y TPY ∼ χ2k(

1

2µTPµ).

P is a square matrix.

Take as a fact (theorem): P is an orthogonal projection if and onlyif P is symmetric and idempotent (i.e. P 2 = P ).

30

Page 38: Linear algebra, multivariate distributions, and all that jazzrcs46/lectures_2015/02-multivar/02-multivar.pdf · Linear algebra, multivariate distributions, and all that jazz Rebecca

Proof: Since P has rank k, there exists an orthogonal matrix Γsuch that

I P = ΓDΓT (by SDT, slide 17) where

I D = diag{1, 1, . . . , 1, 0, . . . , 0}1.Also, recall that a rank-k matrix has k nonzero eigenvalues.Now

Y TPY = Y TΓDΓTY = ZTDZ,

where Zn×1 = ΓTY ∼ N(ξ = ΓTµ, I).

1To show on your own and think about: Why do the eigenvalues of anorthogonal projection have to be either 0 or 1?

31

Page 39: Linear algebra, multivariate distributions, and all that jazzrcs46/lectures_2015/02-multivar/02-multivar.pdf · Linear algebra, multivariate distributions, and all that jazz Rebecca

We now partition the vector Z up into separate components Z(1)

and Z(2) that are uncorrelated with means ξ1 and ξ2.

Z =

(Z

(1)kx1

Z(2)(n−k)x1

)∼ N

((ξ1ξ2

),

(Ik 00 In−k

)).

You should be able to verify the distribution of Z on your own.

Since DZ =

(Z0

)=⇒ ||DZ||2 = ||Z(1)||.2

Since Z(1) ∼ Nk(ξ1, Ik), we know

||Z(1)||2 ∼ χ2k(

1

2||ξ1||2).

32

Page 40: Linear algebra, multivariate distributions, and all that jazzrcs46/lectures_2015/02-multivar/02-multivar.pdf · Linear algebra, multivariate distributions, and all that jazz Rebecca

Recall that ξ =ΓTµ and P = ΓDΓT . By multiplying by ΓT and Γon left and right sides we obtain ΓTPΓ = D.From the above, we find

||ξ1||2 = ||Dξ||2 = ||ΓTPΓΓTµ||2 = ||ΓTPµ||2 = ||Pµ||2 = µTPµ.

(Since Γ is an orthogonal matrix, ||ΓTPµ||2 = ||Pµ||2.)We have shown that ||ξ1||2 = µTPµ, thus,

||Z1||2 ∼ χ2k(

1

2µTPµ).

33

Page 41: Linear algebra, multivariate distributions, and all that jazzrcs46/lectures_2015/02-multivar/02-multivar.pdf · Linear algebra, multivariate distributions, and all that jazz Rebecca

The Multivariate normal distribution

I The multivariate normal (MVN) distribution.

I How do we standardize MVN distributions?

I Spectral and singular value decompositions.

I Computing eigenvalues and vectors in R.

I Some important properties of of the MVN.

I Next time: How to visualize MVN distributions.

34

Page 42: Linear algebra, multivariate distributions, and all that jazzrcs46/lectures_2015/02-multivar/02-multivar.pdf · Linear algebra, multivariate distributions, and all that jazz Rebecca

We assume that the population mean is µ = E(X) andΣ = Var(X) = E[(X − µ)(X − µ)T ], where

µ =

µ1µ2...µp

and

Σ =

σ21 σ12 . . . σ1pσ21 σ22 . . . σ2p

......

. . ....

σp1 σp2 . . . σ2p

.

35

Page 43: Linear algebra, multivariate distributions, and all that jazzrcs46/lectures_2015/02-multivar/02-multivar.pdf · Linear algebra, multivariate distributions, and all that jazz Rebecca

I MVN is generalization of univariate normal.

I For the MVN, we write X ∼MVN (µ,Σ).

I The (i, j)th component of Σ is the covariance between Xi

and Xj (so the diagonal of Σ gives the component variances).

36

Page 44: Linear algebra, multivariate distributions, and all that jazzrcs46/lectures_2015/02-multivar/02-multivar.pdf · Linear algebra, multivariate distributions, and all that jazz Rebecca

Just as the probability density of a scalar normal is

p(x) =(2πσ2

)−1/2exp

{−1

2

(x− µ)2

σ2

}, (3)

the probability density of the multivariate normal is

p(~x) = (2π)−p/2 det Σ−1/2 exp

{−1

2(x− µ)TΣ−1(x− µ)

}. (4)

Univariate normal is special case of the multivariate normal with aone-dimensional mean “vector” and a one-by-one variance“matrix.”

37

Page 45: Linear algebra, multivariate distributions, and all that jazzrcs46/lectures_2015/02-multivar/02-multivar.pdf · Linear algebra, multivariate distributions, and all that jazz Rebecca

Calculations are easy for the “standard” MVN (0, I).

(Every coordinate is an independent N (0, 1).)

Multivariate central limit theorem: if x1, x2, . . . xniid∼ (0, I), then

n−1/2∑n

i=1 xi tends to MVN (0, I) as n→∞.

How do we do calculations for non-standard MVNs?

38

Page 46: Linear algebra, multivariate distributions, and all that jazzrcs46/lectures_2015/02-multivar/02-multivar.pdf · Linear algebra, multivariate distributions, and all that jazz Rebecca

Recall that the parameters of a normal change along with lineartransformations:

X ∼ N (µ, σ2)⇔ aX + b ∼ N (aµ+ b, a2σ2). (5)

I Use to “standardize” any normal to have mean 0 andvariance 1 (by looking at X−µ

σ ).

I Standardize MVNs in a very analogous way

I Need some general results about matrices first: decompositiontheorems.

39

Page 47: Linear algebra, multivariate distributions, and all that jazzrcs46/lectures_2015/02-multivar/02-multivar.pdf · Linear algebra, multivariate distributions, and all that jazz Rebecca

P is orthogonal if P TP = 1 and PP T = 1.

Theorem (Spectral Decomposition)

Let A be symmetric n× n. Then we can write

A = PDP T ,

where D = diag(λ1, . . . , λn) and P is orthogonal. The λs are theeigenvalues of A and ith column of P is an eigenvectorcorresponding to λi.

Orthogonal matrices represent rotations of the coordinates.

Diagonal matrices represent stretchings/shrinkings of coordinates.

40

Page 48: Linear algebra, multivariate distributions, and all that jazzrcs46/lectures_2015/02-multivar/02-multivar.pdf · Linear algebra, multivariate distributions, and all that jazz Rebecca

DefinitionLet A = (aij), B = (bij) be square matrices both of dimensiond× d.

1. The trace of A is the sum of its diagonal elements:tr(A) =

∑i aii.

2. The matrices A,B are similar if there exists an invertiblematrix E such that

A = EBE−1.

41

Page 49: Linear algebra, multivariate distributions, and all that jazzrcs46/lectures_2015/02-multivar/02-multivar.pdf · Linear algebra, multivariate distributions, and all that jazz Rebecca

TheoremSuppose A,B are similar matrices of dimension d× d. Assume A isinvertible and has d distinct eigenvalues λ1, . . . , λd. Let det(A) bethe determinant of A. The following hold:

1. tr(A) =∑i

aii =∑i

λi.

2. The matrices A,B have the same eigenvalues and trace.

3. For any d× d matrix C, the trace and determinant of ACsatisfy: tr(AC) = tr(CA) and det(AC) = det(A)det(C).

42

Page 50: Linear algebra, multivariate distributions, and all that jazzrcs46/lectures_2015/02-multivar/02-multivar.pdf · Linear algebra, multivariate distributions, and all that jazz Rebecca

Definition (Singular Value Decomposition)

Suppose that X is data of size d× n. The singular valuedecomposition (SVD) of X is X = UDV T , where D is an r × rdiagonal matrix. The matrices U and V have sizes d× r andn× r, respectively. Their columns are the left and righteigenvectors of X. The left eigenvectors uj and the righteigenvectors vj of X are unit vectors such that for all j,

XTuj = (uTj X)T = djvj and Xvj = djuj .

Used in PCA and dimension reduction methods.

43

Page 51: Linear algebra, multivariate distributions, and all that jazzrcs46/lectures_2015/02-multivar/02-multivar.pdf · Linear algebra, multivariate distributions, and all that jazz Rebecca

Let Σ =

(3 22 4

). What are the eigenvalues and eigenvectors?

> eigen(matrix(c(3,2,2,4),nrow=2))

$values

[1] 5.561553 1.438447

$vectors

[,1] [,2]

[1,] 0.6154122 -0.7882054

[2,] 0.7882054 0.6154122

44

Page 52: Linear algebra, multivariate distributions, and all that jazzrcs46/lectures_2015/02-multivar/02-multivar.pdf · Linear algebra, multivariate distributions, and all that jazz Rebecca

Did this work?

> Sigma <- matrix(c(3,2,2,4),nrow=2)

> V <- eigen(Sigma)$vectors

> Lambda <- eigen(Sigma)$values

> V %*% diag(Lambda) %*% t(V)

[,1] [,2]

[1,] 3 2

[2,] 2 4

Yes, we find that Σ = V ΛV T .

45

Page 53: Linear algebra, multivariate distributions, and all that jazzrcs46/lectures_2015/02-multivar/02-multivar.pdf · Linear algebra, multivariate distributions, and all that jazz Rebecca

> plot(t(V),xlab="",ylab="",xlim=c(-1,1),ylim=c(-1,1))

> arrows(0,0,V[1,1],V[2,1])

> arrows(0,0,V[1,2],V[2,2])

-1.0 -0.5 0.0 0.5 1.0

-1.0

-0.5

0.0

0.5

1.0

Remark: When the covariances are all positive, the firsteigenvector only has positive entries. We see that here.

46

Page 54: Linear algebra, multivariate distributions, and all that jazzrcs46/lectures_2015/02-multivar/02-multivar.pdf · Linear algebra, multivariate distributions, and all that jazz Rebecca

Let χ2p denote the chi-squared distribution with p degrees of

freedom.

TheoremLet X ∼MVN (µ,Σ) be a p-variate random vector, and assumethat Σ is positive definite.

1. Then Σ−1/2(X − µ) ∼MVN (0, Id×d).

2. Let X2 = (X − µ)TΣ−1(X − µ). Then X2 ∼ χ2d.

The transformation Σ−1/2(X − µ) is called the Mahalanobistransformation.

47

Page 55: Linear algebra, multivariate distributions, and all that jazzrcs46/lectures_2015/02-multivar/02-multivar.pdf · Linear algebra, multivariate distributions, and all that jazz Rebecca

0 5 10 15

0.0

0.2

0.4

0.6

0.8

1.0

X2

Cum

ulat

ive

dist

ribut

ion

X2χ22

48

Page 56: Linear algebra, multivariate distributions, and all that jazzrcs46/lectures_2015/02-multivar/02-multivar.pdf · Linear algebra, multivariate distributions, and all that jazz Rebecca

Sigma = matrix(c(1, .5, .5, 1), 2)

mu = c(0, 0)

# Creating a bivariate normal with 10,000 draws

bivn <- mvrnorm(10000, mu = mu, Sigma = Sigma)

# Calculate the chi-sq random variable

x.2 <- (bivn - mu) %*% solve (Sigma)%*%t(bivn - mu)

plot(ecdf(diag(x.2)),xlab=expression(X^2),

ylab="Cumulative distribution",col="blue",main="",lwd=2)

curve(pchisq(x,df=2),add=TRUE,col="red",lty="dashed",lwd=2)

legend("bottomright",legend=c(expression(X^2),expression(chi[2]^2)),

col=c("blue","red"),lty=c("solid","dashed"))

49

Page 57: Linear algebra, multivariate distributions, and all that jazzrcs46/lectures_2015/02-multivar/02-multivar.pdf · Linear algebra, multivariate distributions, and all that jazz Rebecca

Let A be a positive definite matrix with spectral decompositionA = PDP T =

∑i λiviv

Ti .

DefinitionWe define the square root of A as the matrix

A1/2 = PD1/2P T =∑i

λ1/2i pip

Ti .

Then A = A1/2A1/2 = A1/2(A1/2)T .

50

Page 58: Linear algebra, multivariate distributions, and all that jazzrcs46/lectures_2015/02-multivar/02-multivar.pdf · Linear algebra, multivariate distributions, and all that jazz Rebecca

The MVN density can be written in terms of the square root of theinverse covariance matrix. Note that

f(x) =1

(2π)p/2det(Σ)1/2exp

{−1

2(x− µ)TΣ−1(x− µ)

}=

1

(2π)p/2det(Σ)1/2exp

{−1

2

[Σ−1/2(x− µ)

]T [Σ−1/2(x− µ)

]}=

1

(2π)p/2det(Σ)1/2exp

{−1

2||Σ−1/2(x− µ)||2

}.

51

Page 59: Linear algebra, multivariate distributions, and all that jazzrcs46/lectures_2015/02-multivar/02-multivar.pdf · Linear algebra, multivariate distributions, and all that jazz Rebecca

I The covariance matrix Σ is symmetric and positive definite, sowe know from the spectral decomposition theorem that it canbe written as

Σ = PΛP T .

I Λ is the diagonal matrix of the eigenvalues of Σ.I P is the matrix whose columns are the orthonormal

eigenvectors of Σ (hence V is an orthogonal matrix).I Geometrically, orthogonal matrices represent rotations.I Multiplying by P rotates the coordinate axes so that they are

parallel to the eigenvectors of Σ.I Probabilistically, this tells us that the axes of the

probability-contour ellipse are parallel to those eigenvectors.I The radii of those axes are proportional to the square roots of

the eigenvalues.

52