23
R for Macroecology Tests and models

R for Macroecology Tests and models. Smileys Homework Solutions to the color assignment problem?

Embed Size (px)

Citation preview

Page 1: R for Macroecology Tests and models. Smileys Homework  Solutions to the color assignment problem?

R for Macroecology

Tests and models

Page 2: R for Macroecology Tests and models. Smileys Homework  Solutions to the color assignment problem?

Smileys

Page 3: R for Macroecology Tests and models. Smileys Homework  Solutions to the color assignment problem?

Homework Solutions to the color assignment problem?

Page 4: R for Macroecology Tests and models. Smileys Homework  Solutions to the color assignment problem?

Statistical tests in R This is why we are using R (and not C++)!

t.test() aov() lm() glm() And many more

Page 5: R for Macroecology Tests and models. Smileys Homework  Solutions to the color assignment problem?

Read the documentation! With statistical tests, its particularly important

to read and understand the documentation of each function you use

They may do some complicated things with options, and you want to make sure they do what you want

Default behaviors can change (with, e.g. sample size)

Page 6: R for Macroecology Tests and models. Smileys Homework  Solutions to the color assignment problem?

Returns from statistical tests Statistical tests are functions, so they return

objects> x = 1:10> y = 3:12> t.test(x,y)

Welch Two Sample t-test

data: x and y t = -1.4771, df = 18, p-value = 0.1569alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -4.8446619 0.8446619 sample estimates:mean of x mean of y 5.5 7.5

Page 7: R for Macroecology Tests and models. Smileys Homework  Solutions to the color assignment problem?

Returns from statistical tests Statistical tests are functions, so they return

objects> x = 1:10> y = 3:12> test = t.test(x,y)> str(test)List of 9 $ statistic : Named num -1.48 ..- attr(*, "names")= chr "t" $ parameter : Named num 18 ..- attr(*, "names")= chr "df" $ p.value : num 0.157 $ conf.int : atomic [1:2] -4.845 0.845 ..- attr(*, "conf.level")= num 0.95 $ estimate : Named num [1:2] 5.5 7.5 ..- attr(*, "names")= chr [1:2] "mean of x" "mean of y" $ null.value : Named num 0 ..- attr(*, "names")= chr "difference in means" $ alternative: chr "two.sided" $ method : chr "Welch Two Sample t-test" $ data.name : chr "x and y" - attr(*, "class")= chr "htest"

t.test() returns a list

Page 8: R for Macroecology Tests and models. Smileys Homework  Solutions to the color assignment problem?

Returns from statistical tests Getting the results out

This hopefully looks familiar after last week’s debugging

> x = 1:10> y = 3:12> test = t.test(x,y)> test$p.value[1] 0.1569322> test$conf.int[2][1] 0.8446619> test[[3]][1] 0.1569322

Page 9: R for Macroecology Tests and models. Smileys Homework  Solutions to the color assignment problem?

Model specification Models in R use a common syntax:

Y ~ X1 + X2 + X3 . . . + Xi

Means Y is a linear function of X1:j

Page 10: R for Macroecology Tests and models. Smileys Homework  Solutions to the color assignment problem?

Linear models Basic linear models are fit with lm()

Again, lm() returns a list

> x = 1:10> y = 3:12> test = lm(y ~ x)> test

Call:lm(formula = y ~ x)

Coefficients:(Intercept) x 2 1

Page 11: R for Macroecology Tests and models. Smileys Homework  Solutions to the color assignment problem?

Linear models summary() is helpful for looking at a model> summary(test)

Call:lm(formula = y ~ x)

Residuals: Min 1Q Median 3Q Max -1.293e-15 -1.986e-16 1.343e-16 3.689e-16 6.149e-16

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.000e+00 4.019e-16 4.976e+15 <2e-16 ***x 1.000e+00 6.477e-17 1.544e+16 <2e-16 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 5.883e-16 on 8 degrees of freedomMultiple R-squared: 1, Adjusted R-squared: 1 F-statistic: 2.384e+32 on 1 and 8 DF, p-value: < 2.2e-16

Page 12: R for Macroecology Tests and models. Smileys Homework  Solutions to the color assignment problem?

Extracting coefficients, P values For the list (e.g. “test”) returned by lm(), test$coefficients will give the coefficients, but not the std.error or p value.

Instead, use summary(test)$coefficients

Page 13: R for Macroecology Tests and models. Smileys Homework  Solutions to the color assignment problem?

Model specification - interactions Interactions are specified with a * or a :

X1 * X2 means X1 + X2 + X1:X2

(X1 + X2 + X3)^2 means each term and all second-order interactions

- removes terms constants are included by default, but can be

removed with “-1” more help available using ?formula

Page 14: R for Macroecology Tests and models. Smileys Homework  Solutions to the color assignment problem?

Quadratic terms Because ^2 means something specific in the

context of a model, if you want to square one of your predictors, you have to do something special:> x = 1:10> y = 3:12> test = lm(y ~ x + x^2)> test$coefficients(Intercept) x 2 1> test = lm(y ~ x + I(x^2))> test$coefficients(Intercept) x I(x^2) 2.000000e+00 1.000000e+00 -4.401401e-17

Page 15: R for Macroecology Tests and models. Smileys Homework  Solutions to the color assignment problem?

A break to try things out t test anova linear models

Page 16: R for Macroecology Tests and models. Smileys Homework  Solutions to the color assignment problem?

Plotting a test object plot(t.test(x,y)) does nothing plot(lm(y~x)) plots diagnostic graphs

Page 17: R for Macroecology Tests and models. Smileys Homework  Solutions to the color assignment problem?

Forward and backward selection Uses the step() function

Object – starting modelScope – specifies the range of models to considerDirection – backward, forward or both?Trace – print to the screen?Steps – set a maximum number of stepsk – penalization for adding variables (2 means AIC)

Page 18: R for Macroecology Tests and models. Smileys Homework  Solutions to the color assignment problem?

> x1 = runif(100)> x2 = runif(100)> x3 = runif(100)> x4 = runif(100)> x5 = runif(100)> x6 = runif(100)> y = x1+x2+x3+runif(100)> model = step(lm(y~1),scope = y~x1+x2+x3+x4+x5+x6,direction = "both",trace = F)> summary(model)Call:lm(formula = y ~ x2 + x3 + x1)

Residuals: Min 1Q Median 3Q Max -0.517640 -0.264732 -0.003983 0.241431 0.525636

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.47422 0.09381 5.055 2.06e-06 ***x2 0.90890 0.09887 9.192 8.08e-15 ***x3 1.11036 0.10555 10.520 < 2e-16 ***x1 1.00836 0.10865 9.281 5.23e-15 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.3059 on 96 degrees of freedomMultiple R-squared: 0.7701, Adjusted R-squared: 0.7629 F-statistic: 107.2 on 3 and 96 DF, p-value: < 2.2e-16

Page 19: R for Macroecology Tests and models. Smileys Homework  Solutions to the color assignment problem?

All subsets selection

leaps(x=, y=, wt=rep(1, NROW(x)), int=TRUE, method=c("Cp", "adjr2", "r2"), nbest=10, names=NULL, df=NROW(x), strictly.compatible=TRUE)

x – a matrix of predictorsy – a vector of the responsemethod – how to compare models (Mallows Cp, adjusted R2, or R2)nbest – number of models of each size to return

Page 20: R for Macroecology Tests and models. Smileys Homework  Solutions to the color assignment problem?

> Xmat = cbind(x1,x2,x3,x4,x5,x6)> leaps(x = Xmat, y, method = "Cp", nbest = 2) $which 1 2 3 4 5 61 FALSE TRUE FALSE FALSE FALSE FALSE1 TRUE FALSE FALSE FALSE FALSE FALSE2 TRUE FALSE TRUE FALSE FALSE FALSE2 FALSE TRUE TRUE FALSE FALSE FALSE3 TRUE TRUE TRUE FALSE FALSE FALSE3 FALSE TRUE TRUE TRUE FALSE FALSE4 TRUE TRUE TRUE TRUE FALSE FALSE4 TRUE TRUE TRUE FALSE TRUE FALSE5 TRUE TRUE TRUE TRUE TRUE FALSE5 TRUE TRUE TRUE TRUE FALSE TRUE6 TRUE TRUE TRUE TRUE TRUE TRUE

$label[1] "(Intercept)" "1" "2" "3" "4" "5" "6"

$size [1] 2 2 3 3 4 4 5 5 6 6 7

$Cp [1] 191.732341 197.672391 85.498456 87.117764 3.466464 81.286117 3.579295 5.078805 [9] 5.138063 5.509598 7.000000> leapOut = leaps(x = Xmat, y, method = "Cp", nbest = 2)

Page 21: R for Macroecology Tests and models. Smileys Homework  Solutions to the color assignment problem?

> leapOut = leaps(x = Xmat, y, method = "Cp", nbest = 2)> Xmat = cbind(x1,x2,x3,x4,x5,x6)> leapOut = leaps(x = Xmat, y, method = "Cp", nbest = 2)> aicVals = NULL> for(i in 1:nrow(leapOut$which))> {> model = as.formula(paste("y~",paste(c("x1","x2","x3","x4",

"x5","x6")[leapOut$which[i,]],collapse = "+")))> test = lm(model)> aicVals[i] = AIC(test)> }> aicVals[1] 159.13825 161.18166 113.95184 114.84992 52.81268 112.42959 52.81609 [8] 54.40579 54.34347 54.74159 56.19513

> i = 8> leapOut$which[i,] 1 2 3 4 5 6 TRUE TRUE TRUE FALSE TRUE FALSE> c("x1","x2","x3","x4","x5","x6")[leapOut$which[i,]][1] "x1" "x2" "x3" "x5"> paste("y~",paste(c("x1","x2","x3","x4","x5","x6")[leapOut$which[i,]],collapse

= "+"))[1] "y~ x1+x2+x3+x5"

Page 22: R for Macroecology Tests and models. Smileys Homework  Solutions to the color assignment problem?

Comparing AIC of the best models

> data.frame(leapOut$which,aicVals) X1 X2 X3 X4 X5 X6 aicVals1 FALSE TRUE FALSE FALSE FALSE FALSE 159.138252 TRUE FALSE FALSE FALSE FALSE FALSE 161.181663 TRUE FALSE TRUE FALSE FALSE FALSE 113.951844 FALSE TRUE TRUE FALSE FALSE FALSE 114.849925 TRUE TRUE TRUE FALSE FALSE FALSE 52.812686 FALSE TRUE TRUE TRUE FALSE FALSE 112.429597 TRUE TRUE TRUE TRUE FALSE FALSE 52.816098 TRUE TRUE TRUE FALSE TRUE FALSE 54.405799 TRUE TRUE TRUE TRUE TRUE FALSE 54.3434710 TRUE TRUE TRUE TRUE FALSE TRUE 54.7415911 TRUE TRUE TRUE TRUE TRUE TRUE 56.19513

Page 23: R for Macroecology Tests and models. Smileys Homework  Solutions to the color assignment problem?

Practice with the mammal data VIF, lm(), AIC(), leaps()