29
BIOL 582 Lecture Set 19 Matrices, Matrix calculations, Linear models using linear algebra

BIOL 582 Lecture Set 19 Matrices, Matrix calculations, Linear models using linear algebra

Embed Size (px)

Citation preview

Page 1: BIOL 582 Lecture Set 19 Matrices, Matrix calculations, Linear models using linear algebra

BIOL 582

Lecture Set 19

Matrices, Matrix calculations,

Linear models using linear algebra

Page 2: BIOL 582 Lecture Set 19 Matrices, Matrix calculations, Linear models using linear algebra

• Compact method of expressing mathematical operations (including statistics)

• Makes linear models easier to compute

BIOL 582 Matrix operations

Scalar: a numberVector: an ordered list (array) of scalars (nrows x 1cols)

Matrix: a rectangular array of scalars (nrows x pcols)

3b

Nomenclature: elements (or variables) are italicized, matrices are bold. Lowercase = vector; Capital = matrixMany variants in scientific literature

Page 3: BIOL 582 Lecture Set 19 Matrices, Matrix calculations, Linear models using linear algebra

•Reverse rows and columns•Represent by At or A′•Vector transpose works identically

a d

b e

c f

A ' a b c

d e f

tA = A

BIOL 582 Matrix operations: transpose

Page 4: BIOL 582 Lecture Set 19 Matrices, Matrix calculations, Linear models using linear algebra

• Matrices must have same dimensions• Add/subtract element-wise• Vector addition/subtraction works identically

2 4 6 8

1 3 5 9

8 12A B =

6 12

2 4 6 8

1 3 5 9

-4 -4A B =

-4 -6

Addition

Subtraction

BIOL 582 Matrix operations: addition and subtraction

Page 5: BIOL 582 Lecture Set 19 Matrices, Matrix calculations, Linear models using linear algebra

inner

• Scalar multiplication: Multiply scalar by each element in matrix or vector

• Matrix/vector multiplication is a summed multiplication

• Inner dimensions allow multiplication• Outer dimensions determine size of result• Order of matrices makes a difference: AB ≠ BA

AB

n1 × p1 * n2 × p2

BIOL 582 Matrix operations: multiplication

Inner dimension must agree or multiplication cannot take place

Page 6: BIOL 582 Lecture Set 19 Matrices, Matrix calculations, Linear models using linear algebra

Scalar multiplication:

Matrix multiplication: 1 1 1 2

1 1

2 1 2 21 1

n n

i i i ii i

n n

i i i ii i

a b a b

a b a b

AB =

2 11 2 3 2 6 12 1 6 6

3 34 5 6 8 15 24 4 15 12

4 2

20 13AB =

47 31

BIOL 582 Matrix operations

Page 7: BIOL 582 Lecture Set 19 Matrices, Matrix calculations, Linear models using linear algebra

• Inner (scalar) product: vector multiplication resulting in a scalar (weighted linear combination)

• Outer (matrix) product: vector multiplication resulting in a matrix

6

1 2 3 5 28

4

ta b =

1 6 5 4

2 6 5 4 12 10 8

3 18 15 12

tab =

Inner Product

Outer Product

Inner dimensions MUST AGREE!!!

BIOL 582 Matrix operations

Page 8: BIOL 582 Lecture Set 19 Matrices, Matrix calculations, Linear models using linear algebra

1 0 0

0 1 0

0 0 1

I

1 1

1 1

1

1 2 4

2 5 6

4 6 3

T

4 0 0

0 1 0

0 0 2

D

0 0

0 0

0

2 1 1 0 2 1:

3 4 0 1 3 4

AI A

BIOL 582 Special matrices

np pnX X

• I: Identity matrix (equivalent to ‘1’ for matrices)• 1: A matrix of all ones• 0: A matrix of all zeros• Diagonal: diagonal contains non-zero elements • Square: n = p• Symmetric: off-diagonal elements same:

Page 9: BIOL 582 Lecture Set 19 Matrices, Matrix calculations, Linear models using linear algebra

• Orthogonal: square matrix with property: • VERY useful for statistics and other fields (e.g,

morphometrics)

t AA I

.7071 .7071 .7071 .7071 1 0:

.7071 .7071 .7071 .7071 0 1t

AA IOrthonormal Example:

BIOL 582 Special Matrices

Page 10: BIOL 582 Lecture Set 19 Matrices, Matrix calculations, Linear models using linear algebra

• Cannot divide matrices,

• so calculate the inverse (reciprocal) of denominator and multiply

• Inverses have property that:

• Inverses are tedious to calculate, so in practice we use a computer

• Only works for square matrices whose determinant ≠ 0 (singular)

• Determinant: combination of diagonal and off-diagonal elements

A

B

det( )a ad bc Aa b

c d

A

BIOL 582 Matrix operations: division

Page 11: BIOL 582 Lecture Set 19 Matrices, Matrix calculations, Linear models using linear algebra

• For the 2 x 2 case:

Example:

Confirm:

a b

c d

A 1 1

d b

d b

c a c a

A AA

A

A A

2 3

1 4

A

1

4 34 3 0.8 0.61 5 51 2 1 2 0.2 0.44*2 3*1

5 5

A

1 2 3 .8 .6 1 0

1 4 .2 .4 0 1

AA

BIOL 582 Matrix operations: invserse

Page 12: BIOL 582 Lecture Set 19 Matrices, Matrix calculations, Linear models using linear algebra

• The linear equation

• Can be written in matrix form as

• where

BIOL 582 Linear Model using matrix operations

Page 13: BIOL 582 Lecture Set 19 Matrices, Matrix calculations, Linear models using linear algebra

• Why is it so simple? Consider just this part

• for a simple example of four subjects and two independent variables:

BIOL 582 Linear Model using matrix operations

Page 14: BIOL 582 Lecture Set 19 Matrices, Matrix calculations, Linear models using linear algebra

• The linear model is:

• The estimated coefficients (parameter estimates) are solved as:

• How/why?

• Try to solve for

• Cannot divide both sides by X• Cannot multiply by inverse of X, unless X is square-

symmetric

BIOL 582 Linear Model using matrix operations

Page 15: BIOL 582 Lecture Set 19 Matrices, Matrix calculations, Linear models using linear algebra

• Making X symmetric:

• This matrix can be inverted:

• So, multiplying both sides of by will assist inverting the necessary part

• Note the dimensions so far: (k x n)(n x 1) = (k x n)(n x k)(k x 1) (k x 1) = (k x 1)

• Now multiply both sides by inverse above

• Which has dimensions: (k x n)(n x k) (k x 1) = (k x n)(n x k) (k x 1)

(k x 1) = (k x 1)

BIOL 582 Linear Model using matrix operations

Page 16: BIOL 582 Lecture Set 19 Matrices, Matrix calculations, Linear models using linear algebra

• The equation

• Simplifies to

• And the dimensions of each side remain (k x 1)

• One problem is that the predicted values of the response are unknown without knowing the parameter estimates. However, the best estimates of the response values are the values themselves, so the equation is written as

• What this means is that one does not have to calculate SS for x and y and solve each coefficient independently!

BIOL 582 Linear Model using matrix operations

Page 17: BIOL 582 Lecture Set 19 Matrices, Matrix calculations, Linear models using linear algebra

• Done for a simple linear model of head size as a function of log SVL

BIOL 582 Example in R using Snake data

> snake<-read.csv("snake.data.csv")> attach(snake)> # number of responses> n<-length(HS)> X<-matrix(c(rep(1,n),log(SVL)), nrow=n, ncol=2)> X[1:10,]

[,1] [,2] [1,] 1 3.532226 [2,] 1 4.062166 [3,] 1 4.075841 [4,] 1 4.359270 [5,] 1 4.387014 [6,] 1 4.432007 [7,] 1 4.437934 [8,] 1 4.443827 [9,] 1 4.480740[10,] 1 4.488636

> dim(X)[1] 40 2

Page 18: BIOL 582 Lecture Set 19 Matrices, Matrix calculations, Linear models using linear algebra

• Done for a simple linear model of head size as a function of log SVL

BIOL 582 Example in R using Snake data

> y<-matrix(HS, nrow=n, ncol=1)> y[1:10,];dim(y)

[,1] [1,] 11.40 [2,] 15.30 [3,] 7.16 [4,] 10.50 [5,] 10.30 [6,] 8.25 [7,] 9.74 [8,] 13.10 [9,] 15.10[10,] 14.10

[1] 40 1

Page 19: BIOL 582 Lecture Set 19 Matrices, Matrix calculations, Linear models using linear algebra

• Done for a simple linear model of head size as a function of log SVL

BIOL 582 Example in R using Snake data

> B<-solve(t(X)%*%X)%*%t(X)%*%y> B [,1][1,] -14.377543[2,] 5.695249> > # compare to canned function> lm.snake<-lm(HS~log(SVL),x=T)> lm.snake

Call:lm(formula = HS ~ log(SVL), x = T)

Coefficients:(Intercept) log(SVL) -14.378 5.695

Page 20: BIOL 582 Lecture Set 19 Matrices, Matrix calculations, Linear models using linear algebra

• Done for a simple linear model of head size as a function of log SVL

BIOL 582 Example in R using Snake data

> # Predictions (fitted values)> y.hat<-X%*%B> y.hat[1:7,][1] 5.739363 8.757504 8.835389 10.449585 10.607597 10.863840 10.897599> > # Residuals> e<-y-y.hat> e[1:7,][1] 5.66063702 6.54249646 -1.67538851 0.05041518 -0.30759682 -2.61383971 -1.15759944> > # Compare to> predict(lm.snake)[1:7] 1 2 3 4 5 6 7 5.739363 8.757504 8.835389 10.449585 10.607597 10.863840 10.897599

> resid(lm.snake)[1:7] 1 2 3 4 5 6 7 5.66063702 6.54249646 -1.67538851 0.05041518 -0.30759682 -2.61383971 -1.15759944 >

Page 21: BIOL 582 Lecture Set 19 Matrices, Matrix calculations, Linear models using linear algebra

• After solving

• How does one determine if any or all coefficients are significant?• Do the same thing for a reduced model and compare SSE• First, how does one find SSE?

• First:

• Then

• Thus

BIOL 582 Analysis of variance using matrix operations

Page 22: BIOL 582 Lecture Set 19 Matrices, Matrix calculations, Linear models using linear algebra

• How is ?

• Using the snake example…

BIOL 582 Analysis of variance using matrix operations

> SSE.f<-t(e)%*%e> SSE.f [,1][1,] 236.9725

Page 23: BIOL 582 Lecture Set 19 Matrices, Matrix calculations, Linear models using linear algebra

• ANOVA step by step for the snake data

BIOL 582 Analysis of variance using matrix operations

> # ANOVA by hand, with matrix operations> > X.f<-matrix(c(rep(1,n),log(SVL)),nrow=n,ncol=2)> X.r<-matrix(rep(1,n),nrow=n,ncol=1)> y<-matrix(HS, nrow=n, ncol=1)> B.f<-solve(t(X.f)%*%X.f)%*%t(X.f)%*%y> B.r<-solve(t(X.r)%*%X.r)%*%t(X.r)%*%y> e.f<-y-X.f%*%B.f> e.r<-y-X.r%*%B.r> SSE.f<-t(e.f)%*%e.f> SSE.r<-t(e.r)%*%e.r> > SSE.f [,1][1,] 236.9725> SSE.r [,1][1,] 525.9513

> k.f<-ncol(X.f);k.r<-ncol(X.r)> F.snake<-((SSE.r-SSE.f)/(k.f-k.r))/(SSE.f/(n-k.f))> F.snake [,1][1,] 46.33955

> P.value<-1-pf(F.snake,(k.f-k.r),(n-k.f))> P.value [,1][1,] 4.487631e-08

> R2<-(SSE.r-SSE.f)/(SSE.r) # only because X.r includes only an intercept> R2 [,1][1,] 0.5494403

Page 24: BIOL 582 Lecture Set 19 Matrices, Matrix calculations, Linear models using linear algebra

• ANOVA for the snake data, this time relying on lm functions

BIOL 582 Analysis of variance using matrix operations

> # ANOVA first using lm, then matrix operations> > lm.f<-lm(HS~log(SVL),x=T)> lm.r<-lm(HS~1,x=T)> e.f<-resid(lm.f)> e.r<-resid(lm.r)> SSE.f<-t(e.f)%*%e.f> SSE.r<-t(e.r)%*%e.r> > SSE.f [,1][1,] 236.9725> SSE.r [,1][1,] 525.9513> > k.f<-ncol(X.f);k.r<-ncol(X.r)> F.snake<-((SSE.r-SSE.f)/(k.f-k.r))/(SSE.f/(n-k.f))> F.snake [,1][1,] 46.33955> P.value<-1-pf(F.snake,(k.f-k.r),(n-k.f))> P.value [,1][1,] 4.487631e-08> > R2<-(SSE.r-SSE.f)/(SSE.r)> R2 [,1][1,] 0.5494403

Page 25: BIOL 582 Lecture Set 19 Matrices, Matrix calculations, Linear models using linear algebra

• ANOVA for the snake data, what R does should be clear now

BIOL 582 Analysis of variance using matrix operations

> # ANOVA via model comparison method> > lm.f<-lm(HS~log(SVL),x=T)> lm.r<-lm(HS~1,x=T)> > anova(lm.r,lm.f)Analysis of Variance Table

Model 1: HS ~ 1Model 2: HS ~ log(SVL) Res.Df RSS Df Sum of Sq F Pr(>F) 1 39 525.95 2 38 236.97 1 288.98 46.34 4.488e-08 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 >

Page 26: BIOL 582 Lecture Set 19 Matrices, Matrix calculations, Linear models using linear algebra

• ANOVA for the snake data, what R does should be clear now

BIOL 582 Analysis of variance using matrix operations

> # or just a model summary> summary(lm.f)

Call:lm(formula = HS ~ log(SVL), x = T)

Residuals: Min 1Q Median 3Q Max -4.4953 -1.6932 -0.3986 1.1925 6.5425

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -14.3775 3.5088 -4.098 0.000211 ***log(SVL) 5.6952 0.8366 6.807 4.49e-08 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.497 on 38 degrees of freedomMultiple R-squared: 0.5494, Adjusted R-squared: 0.5376 F-statistic: 46.34 on 1 and 38 DF, p-value: 4.488e-08

Page 27: BIOL 582 Lecture Set 19 Matrices, Matrix calculations, Linear models using linear algebra

• It is worth looking at design matrices…

BIOL 582 Analysis of variance using matrix operations

> X.f [,1] [,2] [1,] 1 3.532226 [2,] 1 4.062166 [3,] 1 4.075841 [4,] 1 4.359270 [5,] 1 4.387014 [6,] 1 4.432007 [7,] 1 4.437934 [8,] 1 4.443827 [9,] 1 4.480740[10,] 1 4.488636[11,] 1 4.509760[12,] 1 4.514151[13,] 1 3.918005[14,] 1 4.146304[15,] 1 4.309456[16,] 1 4.423648[17,] 1 4.479607[18,] 1 4.490881[19,] 1 4.499810[20,] 1 4.567468[21,] 1 4.603168[22,] 1 4.614130[23,] 1 4.668145[24,] 1 4.700480[25,] 1 4.720283[26,] 1 3.303217[27,] 1 3.411148[28,] 1 3.540959[29,] 1 4.326778[30,] 1 4.398146

> lm.f$x (Intercept) log(SVL)1 1 3.5322262 1 4.0621663 1 4.0758414 1 4.3592705 1 4.3870146 1 4.4320077 1 4.4379348 1 4.4438279 1 4.48074010 1 4.48863611 1 4.50976012 1 4.51415113 1 3.91800514 1 4.14630415 1 4.30945616 1 4.42364817 1 4.47960718 1 4.49088119 1 4.49981020 1 4.56746821 1 4.60316822 1 4.61413023 1 4.66814524 1 4.70048025 1 4.72028326 1 3.30321727 1 3.41114828 1 3.54095929 1 4.32677830 1 4.398146

> X.r [,1] [1,] 1 [2,] 1 [3,] 1 [4,] 1 [5,] 1 [6,] 1 [7,] 1 [8,] 1 [9,] 1[10,] 1[11,] 1[12,] 1[13,] 1[14,] 1[15,] 1[16,] 1[17,] 1[18,] 1[19,] 1[20,] 1[21,] 1[22,] 1[23,] 1[24,] 1[25,] 1[26,] 1[27,] 1[28,] 1[29,] 1[30,] 1

> lm.r$x (Intercept)1 12 13 14 15 16 17 18 19 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 1

Page 28: BIOL 582 Lecture Set 19 Matrices, Matrix calculations, Linear models using linear algebra

• Now for an example for a single factor ANOVA

BIOL 582 Analysis of variance using matrix operations

> # Single factor Anova example, relying more so on lm commands> lm.f<-lm(HS~Sex,x=T)> lm.f$x (Intercept) SexM1 1 02 1 03 1 04 1 05 1 06 1 07 1 08 1 09 1 010 1 011 1 012 1 013 1 114 1 115 1 116 1 117 1 118 1 119 1 120 1 121 1 122 1 123 1 124 1 125 1 126 1 027 1 028 1 029 1 030 1 0

> lm.f<-lm(HS~Sex,x=T)> X.f<-lm.f$x> lm.r<-lm(HS~1,x=T)> X.r<-lm.r$x> y<-HS> B.f<-solve(t(X.f)%*%X.f)%*%t(X.f)%*%y> B.r<-solve(t(X.r)%*%X.r)%*%t(X.r)%*%y> e.f<-y-X.f%*%B.f> e.r<-y-X.r%*%B.r> SSE.f<-t(e.f)%*%e.f> SSE.r<-t(e.r)%*%e.r> > SSE.f [,1][1,] 522.5028> SSE.r [,1][1,] 525.9513

> k.f<-ncol(X.f);k.r<-ncol(X.r)> F.snake<-((SSE.r-SSE.f)/(k.f-k.r))/(SSE.f/(n-k.f))> F.snake [,1][1,] 0.2508023> P.value<-1-pf(F.snake,(k.f-k.r),(n-k.f))> P.value [,1][1,] 0.6193992> > R2<-(SSE.r-SSE.f)/(SSE.r)> R2 [,1][1,] 0.006556786

Page 29: BIOL 582 Lecture Set 19 Matrices, Matrix calculations, Linear models using linear algebra

• Now for an example for a single factor ANOVA

BIOL 582 Analysis of variance using matrix operations

> summary(lm.f)

Call:lm(formula = HS ~ Sex, x = T)

Residuals: Min 1Q Median 3Q Max -5.911 -2.356 0.069 3.337 5.619

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 9.6811 0.8740 11.077 1.85e-13 ***SexM -0.5902 1.1785 -0.501 0.619 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.708 on 38 degrees of freedomMultiple R-squared: 0.006557, Adjusted R-squared: -0.01959 F-statistic: 0.2508 on 1 and 38 DF, p-value: 0.6194

> B.f [,1](Intercept) 9.681111SexM -0.590202>